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OPTIMUM ALLOCATION AND VARIANCE COMPONENTS IN 
NESTED SAMPLING WITH AN APPLICATION ‘TO 
CHEMICAL ANALYSIS 


SopHig Marcusp 


U.S. Naval Research Laboratory, 
Washington, D. C. 


INTRODUCTION 


A SAMPLING TECHNIQUE frequently used in chemical and physical 
analyses for estimating the mean of a population is that of multiple 
random subsampling, called nested sampling by P. C. Mahalanobis.’ 
For instance, when determining the moisture content of cheese, a food 
chemist might wish to select his samples randomly from different lots, 
and again from different cheeses of each lot, and finally make duplicate 
determinations on each cheese. A primary objective in the statistical 
design of such a sampling procedure is to minimize the cost of obtaining 
the sample estimate if the desired degree of precision is fixed, or con- 
versely, to maximize the precision of the estimate obtaimed from a 
given amount of expenditure including personnel, time, and equipment. 
The question arises as to how the number of sampling units at each 
level should be determined to meet these optimum requirements assum- 
ing equal frequencies in the subclasses. 

It is assumed in this paper that at each classification level, the cost 
is proportional to the number of units sampled at this level, and that 
the cost per sampling unit is known. Thus the total cost is a linear 
function of the numbers of sampling units at the various levels, with co- 
efficients representing the (known) costs per sampling unit at these levels. 
On the other hand, the precision of the mean yielded by the experiment 
can be expressed in terms of the variance of this sample mean; it will 
then also be a linear function of the variances corresponding to each 
level, with coefficients involving the reciprocals of the number of units 
at the various levels. If the variances at the various levels are not 
known, they should be estimated from a preliminary exper iment. The 
present paper discusses optimum allocation of the sampling units in 
nested sampling in terms of 3 levels. As an illustration of an experi- 
mental situation, a numerical example is given mvolving the estimation 
of variance components. In the appendix, the formulas for optimum 


allocation in nested sampling with & levels are derived. 
OE eS UN SR La a 


1For reference see M. Ganguli’s paper on Nested Sampling [7]. 
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For concreteness, we consider the above mentioned specific problem 
of planning in the most economical way an experiment in food chemistry 
designed to determine the moisture content of cheese, the subsampling 
levels involving lots, cheeses, and determinations. Clearly, the princi- 
ples elucidated in terms of this particular problem for 3 levels are 
applicable to a wider class of problems involving more levels in sub- 
sampling, as, for instance, by expanding this simplified experiment to 
more than one factory. Also, they may be applied to other than 
chemical investigations involving nested.sampling, for instance: in the 
determination of the breaking strength of a certain type of bronze, a 
metallurgist may wish to choose random samples from different ladles, 
then again from different molds of each ladle, and make duplicate de- 
terminations on the samples from each mold; in a manufacturing process, 
the subsampling categories may be lots, bags, and batches; in a gunnery 
experiment, test shooting may be done by different operators taking 
a number of observations on different runs; in agricultural investiga- 
tions, the entire area under survey may be subdivided into a large 
number of zones, these in turn into a large number of smaller zones, 
and so on; in studies of spray deposit in insect work, plots, trees, and 
apple samples have been used as subsampling levels [2]. Examples of 
nested sampling in biological and industrial work together with analyses 
of variance components may be found in G. W. Snedecor’s [10] and 
L. H. C. Tippett’s [12] books. In designing a sample survey for esti- 
mating the jute crop in India, P. C. Mahalanobis [9] has used the cost 
funetion for considerations of optimum allocation and discussed their 
general application to large scale sample surveys; principles of optimum 
allocation in nested sampling have been used by M. H. Hansen et al. 
[8] in a sample survey of business involving 2-fold nested sampling 
from finite populations (countries, stores), and by L. H. C. Tippett [12] 
who describes an experiment where in obtaining soil samples from 
counts of cysts, a number of “borings” of soil were taken and then 
several counts made on each boring. 


DEFINITION OF NESTED SAMPLING 


The problem considered is one in which the total population is sub- 
divided into primary sampling units (lots); these in turn are subdivided 
into secondary sampling units (cheeses) on which several measurements 
(determinations) are made representing the tertiary sampling units. 
The nested sample is obtained by selecting at random first n, primary 
(lots), then mn, secondary (cheeses), and finally n,; tertiary sampling 
units (determinations) from each of the preceding units, where n, pidterd 
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m3 represent the class frequencies. A measure of the variance of the 
sample mean in terms of the class frequencies is desired. Before de- 
riving it, the structure of the mathematical model will be explained. 

Let x,;; denote the j-th determination from the i-th cheese of the 
h-th lot. Assuming that the effects of the sampling units at the different 
levels are additive, we may describe an individual observation Gree 10 
nested sampling [7] as: 

Cee jh ont Yee ic Saas ; (1) 
h = 1, 2, --- , n, where h refers to the lot of cheese 

@ = 1, 2, --- , m. where 7 refers to the cheese in each lot 
j = 1, 2, --- , 3 where j refers to the determination on each cheese. 
The value y» represents the general population mean and is thus a fixed 
constant. The components & , ; , €,:; are random variables with 
means and covariances equal to zero and with variances equal to 
oi , 02 , o3 , respectively, called variance components. Thus the com- 
ponents & , mi , &x:; represent the effects peculiar to the lots, cheeses, 
and determinations, and the variance components the variabilities at 
the different levels. 


VARIANCE OF SAMPLE MEAN AND ESTIMATION OF VARIANCE COMPONENTS 
IN NESTED SAMPLING 


From the definition of an individual observation 2,,;; in nested 
sampling, given by equation (1), we have for the sample mean 


2a bs i, a Nhi a » De Ones 
es ut h=1 4 h=1_ i=1 ae h=1 i=1 j=1 (2) 


ny NiN2 Ni N2Ns 


Then because of the assumptions made for the random variables &, , 
nn» Cuz We Obtain for the variance of the sample mean 


2 2 

Qs O71 2 G3 

72 nT ung | mytaty @) 
This expression gives the variance or precision of the sample mean as a 
linear function of the reciprocals of n, , nym. , and n,ngnz representing 
the total number of lots, cheeses, and determinations used. The co- 
efficients are the variance components o; , 02 , a; , being the variances 
encountered at the 3 subsampling levels. 

As long as the parameter values o; , ¢2 , 3 are unknown, the variance 
function az in (3) cannot be used for solving the problem to determine 
the optimum values of the class frequencies. On the other hand, if a 
set of class frequencies were given and used in performing an essere, 
in nested sampling, then the unknown parameters Tu Gs) 05 Could 
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be estimated from an analysis of variance of the experimental data. 
This dilemma” may be evaded by first carrying out a preliminary ex- 
periment in nested sampling® using a set of arbitrarily chosen class 


TABLE 1 
ANALYSIS OF VARIANCE IN 3-FOLD NESTED SAMPLING 
Degrees of Mean Expected 
Source of Variation freedom Square Mean Square 
= = ‘ S al kare MS 2 * 2 an *,,* 2 
Primary sampling units n} 1 Ms, o3 + nxo5 NENS Oy 
Secondary sampling units 
. . 5 . 2 2 
within primary units n¥(n¥ — 1) MS, o3 + nay 
Tertiary sampling units 
. 5 ‘ 2 
within secondary units| n¥n¥(n¥ — 1) | MS, o3 


frequencies. We will show how the data obtained from such a pre- 
liminary experiment give advance estimates of o; , 02 , o3 , SAY Si, 
8; , 8; , to be used for estimating the coefficients of the variance func- 
tion. 

Denote by n¥, n¥, nf the given class frequencies of the preliminary 
experiment in nested sampling. Perform a customary analysis of 
variance on the observed data, as shown in the first 3 columns of table 1, 
where MS, , MS, , and MS; denote the mean squares corresponding to 
the primary, secondary, and tertiary sampling units. It can be shown 
that the expected values of the mean squares MS, , MS, , and MS; are 
the expressions shown in the last column of table 1*. Considering the 
estimates of these expressions by substituting the estimated variance 
components s; , 83 , 8; , we obtain the equations 


MS, = s; + n¥s; + n¥nks; 
MS, = 83 + n¥s; (4) 


MS; = 83 


*See M. Friedman’s discussion of a similar situation in planning an experiment ({11], p. 345). 

’Or a mixed model design of experiment (e.g. randomized blocks or split plot) which includes the 
subsampling categories under consideration, Note that such a design might involve more degrees of 
freedom thus increasing the reliability of the estimated variance components ([3], [4]). 

‘Results for any number of sub-samplings and unequal frequencies are given by M. Ganguli [7]. 
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Whence we have the solutions 


—- (5) 


~m 
a) 


ets 


in which the estimated variance components are expressed in terms of 
the mean squares calculated in the analysis of variance table of the 
experimental data from nested sampling.” These equations can be 
extended from three to k subsamplings by the same reasoning. 


OPTIMUM ALLOCATION IN 3-FOLD NESTED SAMPLING 


The variance of the sample mean and the total cost expenditure for 
determining it, expressed in terms of the class frequencies, are the two 
functions needed for solving the optimum allocation problem under 
consideration. Considering the case of 3 levels, let C(m, , m2 , m3) be the 
cost function and V(n, , nm , m3) the variance function, the variables 
N, , M2 , Nz representing the class frequencies. As given by equation 
(6), the cost function C(n, , m2 , n3) is assumed to be an additive function 
of the costs at the three levels, that is the costs of n, primary, nN 
secondary, and n,n,.n3 tertiary sampling units altogether, the cost per 
primary, secondary, and tertiary sampling unit being ¢, , c , and c; 
respectively. The variance function V(n, , m , 3) 1s given by equation 
(3) showing the variance of the sample mean, o;, in 3-fold nested 
sampling; its parameters may be estimated from the data of a pre- 
liminary experiment by the analysis of variance procedure for esti- 
mating variance components as described above. Thus we have: 


CO, > Ne y Ns) = Cin + CoMNo f- CaN N2Nz (6) 


2 2 
bn eS Os (3) 


VG ip. aay = 
Gh ; Diss 3) Ny NyNs NNN 


The problem of optimum allocation is to minimize C(n, , m , m3) by 
proper choice of n, , N2 , Ns subject to the constraint that the allowable 


5This analysis of the variance components was performed on data from nested sampling, which 
is a special case of Model II analysis of variance as shown below. If a similar analysis of variance 
components is routinely carried out on data belonging to Model I, the interpretation differs. In 
Model II, the computed variance components estimate the variances 01? , 02? , 73? associated with ran- 
dom factors, whereas in Model I, these are dummy symbols representing sums of squares of differences 
related to the variation of systematic (or fixed) factors ([1], [5]). 
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amount of variance is preassigned, say v, or to minimize V(n, , Me , N3) 
by proper choice of n, , m2 , 2s Subject to the constraint that the total 
amount of cost is fixed, say c. Let ne: , Neo, Mc3 and Ny1 , Nye , Nya be 
the optimum solutions of the two problems respectively. By applying 
Lagrange multipliers it can be shown® that these optimum values of 
NM, , No , Ns are 


> G Ve 


n = = 
ic v VS Cy 
C 
UG o ae (7) 
01 Co 
loa == fe 
ce 02 Ves 
ni = 2 
Liye ier ee es a enh -gire 
Ds (o; 4/¢:) Ver 
i=1 
le, 
Nye = #2 A (8) 
07; Co 


J3 =|Ce2 
ny = ae 
02 C3 


The sets of equations (7) and (8) show similar features. Except 
for the first level, the optimum combination of the number of sampling 
units is independent of the given degree of precision or the fixed total 
cost, being the same whether the precision or the amount of cost is 
assigned beforehand. Therefore, when planning an experiment in 
nested sampling the analyst need be concerned with the given cost or 
precision only in selecting the number of primary sampling units. 
Clearly, an increase in funds would be utilized most efficiently, that 
is resulting in the highest possible precision, by a proportional inerease 
in the number of primary sampling units, and similarly, the most 
economical way for attaining a higher degree of precision would consist 
in choosing a correspondingly greater number of primary sampling 
units. 

In many instances, the research analyst might not wish to depend 


§See appendix for development of these formulas. 
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on considerations of optimum allocation in the choice of the frequencies 
at all levels, but might prefer to take, for instance, duplicate or triplicate 
determinations from each cheese for check purposes, thus preassigning 
the class frequency associated to the tertiary sampling unit, n, .% If 
ns is prefixed, the corresponding optimum allocation formulas’ are i 


E vo Ne += “Ne ca) 


he = = 
v 


Ay: 


| = (9) 
Pe o3 
02 
Nos = = = 
3 O71 V Co + C3N3 
in the case that the variance v is given; and 
LS as ee 2 5 
— 4 2 ' é 
E \/ 6; ar V a + Ne, =e can) | 
3 
(10) 


lO: — =. 
SS ee Cae 
Nye = 
O71 Cp =n Calta 


in the case that the total cost c is given. 


NUMERICAL EXAMPLE 


The figures shown in table 2 are results from analyses of samples 
of cheese for the determination of moisture content.” They will serve 
as the preliminary data for obtaining estimates of the variance com- 
ponents. The experimental set-up in nested sampling involves duplicate 
determinations made on 2 cheeses from each of 3 lots, the different 
cheeses and the different lots being randomly selected (n¥ = 3, n¥ = 
Qe 2). 

The first 4 columns of table 3 show the results of an analysis of 
variance of these data. In nested sampling the sums of squares may 
be calculated as follows: Consider first table 2 (in which there are 3 
factors: duplicates, cheeses, and lots) and refer to the figures, repre- 
senting 1 determination, as ‘‘totals.’’ Subsequently, obtain the totals 


7See appendix for development of formulas in which all but the first k’ are fixed. 

8The data are drawn from “‘Report on Sampling Fat and Moisture in Cheese” by William Horwitz 
and Lila F. Knudsen, J. Ass. Off. Agr. Chem., vol. 31 (1948), pp. 300-306; slight modifications have 
been made for illustrative purposes. The author acknowledges the suggestions of Lila F. Knudsen; 
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TABLE 2 


MOISTURE CONTENT OF 2 CHEESES FROM EACH OF 3 DIFFERENT LOTS, 
DETERMINED 2 TIMES 


Lot 
Cheese 
il Ii Jae 
1 39.02 35.74 37.02 
38.79 iy, il 36.00 
2; 38.96 35.58 B15). 70) 
39.01 iy, BY) 36.04 


for the duplicates on each cheese (there remain 2 factors: cheeses and 
lots), and also the totals of the 4 determinations on each lot (there 
remains 1 factor: lots), in addition to the total for the entire table (no 


TABLE 3 


ANALYSIS OF VARIANCE OF DATA ON MOISTURE CONTENT OF CHEESE 
GIVEN IN TABLE 2 


Source of Degrees Sum of Mean Expected Estimated 
Variation of Squares Square Mean Square Variance 
Freedom Components 
Lots 2 SS: = 25.9001 | MS: = 12.9501 | o3? + 200? + 4012] s12 = 3.2028 
Cheeses 
within lots 3 SSo = .4166 | MS. = .1389 | o3? + 2a02 so? = .0143 
Determinations 
within cheeses 6 SS3 = .6620 | WS3 = .1103 |} o3? ss? = ,1103 


factor remains). Denote by Q; , Q2 , Q, , and Q, the sum of squares of 
these corresponding totals divided by the number of determinations 
making up each total: 


Qs = 39.02’ + 38.79 + --- + 35.70? + 36.04? = 16,365.5607 
ie 77.80) + 77.97" + 71.15" + 71.107 + 73.027 7174 


2 
= 16,364.8988 
L5oc 78) e142 OF 4a 7G 
Oras + i SEAS 16,364.4821 
449.79? 
Q = aoe a 16,338 .5820 


12 
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Then the sums of squares in analysis of variance, SS, , SS. , SS; , are 
the successive differences of these expressions: 


SS, = Q, a Qo 
SS. = 0,— 0, = 0.4166 


25.9001 


SS; = Q; — Q. = 0.6620° 


The sums of squares and the corresponding mean squares are shown 
in columns 3 and 4 of table 3. The estimated variance components 
8; , 82, 83, shown in the last column of table 3, follow from equations 
(5). These values represent the advance estimates from the pre- 
liminary data to be used in the planning of the experiment. 

The problem of designing an experiment with optimum allocation 
may arise in chemical laboratory work, e.g., when it is desired to set 
up in the most economical way routine analyses of samples of cheese 
for the determination of moisture content. In the example under 
consideration we assume that the chemist wants to spend not more 
than 60 dollars altogether to be allocated in such a way that the highest 
precision results; that he requires duplicate determinations for check 
purposes; and that the cost factors per lot, cheese, and determination 
are 10, 3, and 1 dollar respectively. Since these requirements prefix 
the class frequency n, and the total cost C, formulas (10) are appro- 
priate. Substituting n, = 2, ¢c =-60;c, = 10, co, = 3, and ¢ = 1, and 
for the variances oj , 2, o3 their estimates sj = 3.2028, s; = 0.0148, 
s; = 0.1103, we obtain: 


Nn — 5.43 Nye sSy 0.21 


The corresponding integer values have to be chosen in accordance with 
the conditions of the experiment. Since n, , the number of cheeses 
selected from each lot, must be at least one, the number of lots, , , 
may be reduced. An examination of the integers smaller than nj, 
shows that n, = 4 together with n. = 1 fulfill the required conditions. 
Thus 4 lots and 1 cheese give the optimum solution for the problem 
under consideration. 

The merit of this optimum combination may be judged by com- 
paring it to other combinations of class frequencies. In table 4 a 
number of various combinations (columns 1 and 2) are presented 
together with the precision of the sample mean (columns 5 and 6) and 


9Using the figures given for Q2 , Qs above, we have Qs — Q2 = .6619 instead of .6620. Such a dif- 
ference in the last decimal place is due to rounding off results, intermediate computations being carried 


out to more decimal places. 
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TABLE 4 
ESTIMATED PRECISION AND COST OF DETERMINING MOISTURE CONTENT OF 
CHEESE WHEN A SPECIFIED NUMBER OF LOTS (u) AND A SPECIFIED NUMBER OF 
CHEESES FROM EACH LOT (m) ARE USED AND TWO DETERMINATIONS (ns = 2) 
ARE MADE ON EACH CHEESE. CONSTANTS USED ARE ADVANCE ESTIMATES CAL- 
CULATED FROM PRELIMINARY DATA (TABLES 2 AND 3). 


Formulas used: Constants used: 
N = MNNens es ae 
C =onm tenn, + ann, Cc = 10,c¢ = 3,6 = 1 
< Re < 2 2 
V=2444-— 3 S| = 3.202855, —0 0143753103 
Ny NyNo NyNoNs 
Vv i 
C= vou x 100 Z = 36.90 
x 
Number of— Expenditure Estimated Precision 
Lots Cheeses Number of | Total Cost Variance Coefficient 
Determina- in dollars of mean of 
tions Variation 
ny Ne N G V CW; 
() (2) (3) (4) (5) (6) 
5 3 30 125 0.6452 2.18 
5 2 20 100 0.6475 els 
5 1 10 75 0.6544 2.19 
4 3 24. 100 0.8065 2.43 
4 2 16 80 0.8094 2.44 
4 1 8 60 0.8181 DeAS 
3 3 18 Ho iL Oss} Zeit 
3 2 12 60 1.0792 2.82 
3 il 6 45 1.0907 2.83 
2 3 12 50 1.6130 3.44 
2 2) 8 40 1.6188 Seo) 
2 1 4 30 1.6361 3.47 
1 3 6 2a) 3.2259 4.87 
1 2 4 20 3) PEED) 4.88 
1 1 y 15 3 4.90 


the expenditure involved in determining it (columns 3 and 4). Column 
3 shows the total number of determinations made, the total cost is 
given in column 4, and column 6 compares the relative precision of 
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the sample mean, indicated by its coefficient of variation, to the absolute 
precision in terms of the variance (column 5). Duplicate determina- 
tions are used throughout. It can be seen that the 4-1-2 combination 
is more economical than the 3-2-2 combination—the one used in the 
prelimimary experiment—since it obtains a higher precision but re- 
quires the same cost (60 dollars). Also, the combination 3-2-2 is less 
efficient than the combination 3-1-2 since, for the same precision, the 
latter combination needs half the number of determinations and re- 
quires only 45 dollars instead of 60 dollars. In general, it pays to in- 
crease the number of lots instead of the number of cheeses since the 
former are more variable. 


REMARKS ON NESTED SAMPLING AS A SPECIAL CASE OF 
MODEL II ANALYSIS OF VARIANCE 

The mathematical model of nested sampling as given by the funda- 
mental equation (1) and its assumptions, is closely related to one 
specific mathematical model used in analysis of variance. Two models 
of analysis of variance, usually referred to as Model I and Model II, 
have been discussed recently by S. L. Crump [3] and C. Eisenhart [5]. 
It seems worthwhile to show that, in virtue of the underlying assump- 
tions, nested sampling represents a special case of Model II of analysis 
of variance. 

The two different models of analysis of variance involve the analysis 
of two different types of factors: systematic factors in Model I and 
random factors in Model II. A factor such as ‘‘treatment’’ or “lot” 
is a random or a systematic factor depending on the way its variants 
are chosen. Here the term ‘“‘variant”’ of a factor is used based on Fisher’s 
terminology [6], for instance, the variants of the factor “treatment” 
may be e.g. “nitrogen” and ‘‘phosphate”’ and different lots the variants 
of the factor “lot.’”’ When an experimenter selects the two treatments 
“nitrogen” and “phosphate,” he selects them systematically from a 
population of possible treatments on the basis of subject matter judg- 
ment; on the other hand, when selecting different lots of material for 
studying the effects of the treatments, he generally bases his choice on 
random selection ([5], [10] Chapter 8). Since systematically chosen 
variants produce systematic variation and randomly chosen variants 
random variation, the type of factor may be determined according to 
the issue: systematic or random variation. Usually, “methods” and 
“treatments” represent systematic factors, “blocks” and “lots” random 
factors, whereas factors such as “days” or “animals” or “locations” 
may represent either systematic or random factors; both types of factor 
will often occur in the same experiment; then the model is a mixed one. 
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Now the factors encountered in nested sampling are the primary, 
secondary, tertiary sampling units (lots, cheeses, determinations). 
Under the assumptions made, the variants of these factors, i.e. the 
units selected at each level, were chosen randomly. These factors, 
therefore, are random factors and thus nested sampling belongs to 
Model II. 

In order to describe more accurately the relationship of nested 
sampling to Model II of analysis of variance, we subdivide the random 
factors of Model II into two categories: cross classified’’ with respect to 
another factor or not. For instance, in the 2 factor ‘‘day-animal’’ 
experiment discussed by C. Eisenhart [5] as an example of Model I, 
the random factor “animal” is cross classified with respect to the factor 
“days,” each of the randomly chosen animals being tested on all days 
(the analysis of variance table contains: ‘‘Between days’’, ““Between 
animals,” and ‘‘Residual” with d — 1, and a — 1, and (a — 1)(d — 1) 
degrees of freedom respectively). On the other hand, there would be 
no cross classification, if on each day a number of animals were randomly 
chosen for testing, as for instance in an inoculation experiment affecting 
the sensitivity of the animal (the analysis of variance contains: ‘“Be- 
tween days,” and ‘Between animals within days’ with d — 1, and 
d(a — 1) degrees of freedom respectively). Likewise, no cross classi- 
fication would be involved for the random factor ‘‘animal’’ if each 
animal would be tested on a couple of days which were randomly 
selected, as e.g. if only one animal could be tested per day (the analysis 
of variance contains: “Between animals,’’ and “Between days within 
animals” with (a — 1), and a(d — 1) degrees of freedom respectively). 
Nested sampling represents the second category of Model If in which 
the random factors involved are not cross classified sinee for each 
primary sampling unit a number of secondary sampling units is se- 
lected randomly, and so on. The question as to which order of sub- 
sampling should be adopted in the nested sampling procedure, as, for 
instance, whether to use “animals” as primary sampling units and 
“days” as secondary sampling units, or conversely, is a decision to be 
made on the basis of subject matter judgment. 


APPENDIX 


We shall now derive the optimum values of the class frequencies, 
given for the three-fold level by formulas (7), (8), (9), and (10), for 
the general case of k-fold nested sampling. Instead of solving the prob- 


0This term is not synonymous with “ordered”, Note that items in table 2 below are ordered for 


purely designative reasons there being neither a cross classification nor an element of “‘sequence” 
involved. 
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lem directly by introducing the Lagrange multiplier, we will apply 
this procedure to a pair of generalized functions. We then obtain as 
special cases the solution formulas for optimum allocation in 
i. k-fold nested sampling 
i. k-fold nested sampling in which some class frequencies are fixed 
beforehand 
ii. stratified sampling from finite population (k strata, 2 levels). 


a. Minimum Problem for 2 Generalized Functions 


Let the two generalized functions be 


‘ 
PAN, hy rae ae N;) = De aN; + a (11) 
AUC saarnt Niglo=t >” “7 ae (12) 
i=1 7 

where N, , --- , N; denote variables and a, , a, , and ay; , a; (¢ = 1, 

-++ ,k) are constants. 
Consider first the problem to minimize F,(N,, --- , N;,) subject to 

the side condition 

PIN Bee 2 Ny) ad, (13) 


where b, is a constant. Using the Lagrange multiplier \ in the usual 
way, we let the derivatives of F, + AF, with respect to N; (@@ = 1, 
--+ , k) be zero, and obtain 


ai; — (Adz; /N%) = 0 


N= A) dail Os 


Substituting these values of N; in (13), where F, is given by (12), we have 


Or 


= ke _—_—— 
FAN, eM, N,) = iy “/X) ee Vay itr: + ad, = by 


Therefore : 


e 2d SVATCEE 
Vx = 


by ths 
Hence we obtain the optimum values 


k 
yy V/ 0h ins 
Ng 2 Se ae (14) 
li 


be — 2 
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Similarly, we obtain the solution of the problem to minimize FAN, , 
- , N,) subject to the side condition 


FIN se? Ny) 2k (15) 


where b, is a constant: 


(16) 


Now introduce the variables 
i= N, ) ta = Nic Nie (a = 2, eS acy k) (17) 


then NV; = n, --: n,(¢ = 1,--- ,k). Substituting the new variables in 
(11) and (12), we obtain the functions 


: 

fia 9 +-*, te) = Dy Gay ->- Eh (18) 
k fa 

ties ee) eae + a2 (19) 
t=1 1 a 


Substituting (14) in (17), we find that the minimum solutions of f,(7, , 
- , m,) under the side condition f.(m, , +--+ , n,) = bs are: 


21 


I 


Ny 


k —— 
2 V/ 01:02; a 
b. — az 4 


and (20) 
Ds 24M, 1-1 ee w 7 
oe Nee ° Re = : i 


Similarly, substituting (16) in (17), we find the minimum solutions of 
Jam » --> , Mm.) under the side condition 7,(m, , =-~ ,.%,) = 0, - 


OS aor Ao, 
Noy = 7 
S M1 
/ED, A) Gana, 
i=1 
and (21) 


|@2 ir, i-1 5 
No; = ee (@@ = 2, +--+ ,k) 
4 iQ2, 4-1 


Note that ny; = m(¢ = 2, --- , k). 
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b. Application to Optimum Allocation Problems in Sampling 
i. Nested Sampling 


_ Substituting a,; = ¢; , a; = o; and a, = a, = 0 in (18) and (19), we 
obtain the 2 functions 


k 

Oy =) = > ey <n, (22) 
i=1 
k Ga 

ee ee Te) 2 (ee eo 


These functions represent the general case of the cost function C(n, , no , 
nz) and the variance function V(n, , nz , m3) used above in section 4. 


Setting b, = c and b, = v yields the corresponding side conditions. 
Therefore applying formulas (20) and (21), we have as the minimum 
solutions of g;(n, , --- , m,) under the side condition g.(n, ,-+- ,m) =v 


o 


| = (o; Wei) 


fein = z Ve 
and - (24) 
Of Creat r 
ek ONG ee Oe Sh 
i Ng (2 , k) 
and as the minimum solutions of g.(n; , --- , m,) under the side condition 
GiOlige ay the) =e 
fie = os 
2 k a 
= G 
Ye (o; Vi) we 
i=1 
and (25) 
Ng; = 2 cit (a = 2, Ca as) 


O;j-1 C; 


Specializing equations (24) and (25) to the case k = 3 yields equa- 
tions (7) and (8). Specializing equation (25) to the case head 
letting cost be expressed in terms of time, ¢, = Kt, ¢ = 1, gives equation 
10.32 in L. H. C. Tippett’s book [12]. 


ii. Nested Sampling with Some Prefixed Class Frequencies 


Let ni, «++ , ne be the unknown frequencies and Nyr41, °° , % be 
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fixed beforehand. The equations (22) and (23) may then be rewritten 
in terms of n{, --- , ng as follows: 


kv k—-k 
hi(nt, s+, mb) = Doel nit nls mh DY Carp eran 0+ Meret 
j=1 t=1 


(26) 


Se 
= dein ---n} 
7=1 


where C619 Sales eee) 
and ts (27) 
Che = Chr + y (Gy eis f(y Ore) 2 SY yaa, 
l=1 
k’ 2 k—-k’ 2 
Oj 1 On +t 
/ ore is ——— 1 _ 
EEE e ara Rr AR eet 2d ee oe 
(28) 
kt 72 
-ro, 
a Hf 
7=1 nN, nN; 
where G6 =o (yt lie id) 
and 
k—k! - (29) 
12 2 ki +l 
EPS eS 
: f 2X Mya 2 Miwa 


Thus the functions h, and h, of the variables nj, --- , m4 , given by (26) 
and (28), represent the same types of function as the functions g, and g, of 
the variables n, , --- , m given by (22) and (23). Therefore the mini- 
mum solutions of h,(n{ , --- , ng) and h.(n{ , --- , me) under the side 
conditions ha(n{ , --- , ny) = vand h(n{ , +--+ , nk) = c respectively, 
may be obtained from equations (24) and (25) by replacing k by k’ , 
o by o’, and c by c’, and then substituting back oj and c} (j7 = 1, --- , k’) 
from equations (27) and (29). 
For k = 3, k’ = 2 we obtain from (27) and (29) 


Ci = Cy Ce. > {655 + C3Ng 
Ge 

2 2 3 

ot Oy Ge (a7) +. aa 
N3 


The substitution of these values into (24) and (25) after replacement of 
k, ec, o by k’, c’, o’ gives the formulas (9) and (10) used above. 

Note that the results of b. ii. may also be obtained from a. and then 
b. i. be considered as the special case k’ = k. 
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Or 


ui. Stratified Sampling from Finite Populations 


We will indicate briefly the applicability of the above used general- 
ized functions to stratified sampling involving two levels. 

Let there be k strata in the population with M, elements x;; m the 
7th stratum (¢ = 1, --- ,k;j7 = 1, --- , M;). Assume that the N; 
sample elements z,;; (¢ = 1, --- ,k;7 =1,---, N;) are independently 
drawn at random from the k finite strata. Then the sample mean 


has the variance 


where WM = bio M, and oc; denotes the variance between elements in 
the 7-th stratum. Thus we have 


k 

2 a2; 

oz = Diag + a, 
i=1 447 


Mia? fon Mey Se al op tied 
CM ie ee Pea acl 


where ihee 


Let c; be the cost per element in the 7-th stratum and c = Sn c,N; 
the total cost, then cmay be writtenc = )°%_, a,,N; + a, where a; = ¢; 
anda, = 0. Thusc and o; correspond to the functions F',(N, , --- , N;) 
and F,(N,,--- , N;) respectively in (11) and (12). Therefore equations 
(14) and (16) give the desired minimum solutions where b, and b, de- 
termine the side conditions corresponding to (13) and (15). In case the 
populations in the strata are large (WM; ~ M; — 1), we obtain the well 
known optimum allocation formulas: 


k — 
2d (M jo; V/ci) Mc; 
VC: 


Ni; = k 
Mb, + >) (M.0') 


b, M50; 
k een 
ye M jo; Ve; ver 


No = 
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FITTING A STRAIGHT LINE WHEN BOTH VARIABLES 
ARE SUBJECT TO ERROR 


M. 8. BaArTLerr 
University of Manchester, England 


INTRODUCTION 


AN eee METHOD of fitting a straight line when both variables are sub- 
ject to error was examined by Wald (1) in 1940. The purpose of the 
present note is to present and illustrate a modification of Wald’s method 
having the advantage in general of greater accuracy. Before any de- 
tailed exposition it will be as well to recall two important points: 


(i) a distinction must be made between the linear regression equation of 
a variable y on a second variable x, and a linear functional relation 
between two variables Y and X masked by errors. The former 
equation is still available for prediction even if the variable x is sub- 
ject to error, but is not necessarily appropriate for a functional rela- 
tion when one exists. 

(ii) it is possible to set up maximum likelihood equations for the second 
problem, but they do not lead to a unique solution without further 
assumptions, such as an assumption about the relative magnitude of 


the errors in x and y. 


These points have been emphasized by many previous writers, for 
example, by Wald (1) or more recently by Lindley (2). In view of (i1) 
it is useful to consider, in the common case when the observations have 
equal weight, the following elementary method: 


(a) For the location of the fitted straight line use as one point the mean 
coordinates 2, 7, just as in the least-squares method. 

(b) For the slope, first divide the n plotted points into three groups, the 
equal numbers / in the two extreme groups being chosen to be as 
near 4n as possible (the three groups are non-overlapping when 
considered, say, in the x direction). The join of the mean coordinates 
Z,, J, and Z; , y3 for the two extreme groups is used to determine the 


slope. 


The only difference from Wald’s original method is the use of three 
groups instead of two, for reasons which will be apparent from the results 
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of the next section.’ It will also be shown that Wald’s confidence interval 
method of assessing the accuracy (under suitable conditions) may be 
adapted to the present method. 


EFFICIENCY IN A SPECIAL CASE 


To get some idea of the efficiency of the method its accuracy is 
determined in a special case where least-squares is appropriate. It is 
assumed that observations y are available forn = 21 + 1 valuesx = X 
not subject to error and spaced at equidistant unit intervals. The least- 
squares estimate is known to provide the linear combination of the y’s 
providing an unbiased estimate of the true slope 6 in the functional 
relation 


(1) Y=a+ px 


with minimum variance when the differences y — Y are uncorrelated and 
of constant variance o. The least-squares estimate 


eee) Chee) Pace pe 


has error variance a / >, (x — )’, where >> (2 — 2) = (4)l(1 + 1) 
(21 + 1) in the situation assumed in this section. 
For comparison the error variance of the estimate 


(2) 4 je 


Poe ay 
of the last section is easily evaluated for any value of k. It is given by 
Qo” _ 20” ’ 
k(@ —@)? k2l—-k+ 1) ° 


The relative efficiency of b’ is thus 


_ 3k(21— k + 1)? 
Ch yer anys: 


(3) E 


This is a maximum when 
(21d Ot eh eee 
with relevant root k = (3)(21 + 1) = 4n. 


11 am indebted to Professor Gerhard Tintner for drawing my attention to a previous discussion of 
this problem, with a similar conclusion, by Nair and Shrivastava (4) (see also Nair and Banerjee (5)) 
It might be noted that these authors propose using the two extreme groups out of three for location As 
well as slope, but recommendation (a) above is theoretically preferable. In the first of these two papers 


the extension of the method to fitting higher-order curves is also considered, though the optimum 
efficiency is not so high in such cases. 
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We then have 


(4) Kk = 


which may be compared with H = (3)(1 + 4)/{l(l + 1)] > 3/4 when 
k = 3n. The higher efficiency of k = 4n compared with k = 4n suggests 
the adoption of k = 4n in preference to k = 4n in general. Indeed its 
high efficiency in the case examined above indicates the occasional value 
of the simple method proposed even in cases where the least-squares 
method is available. 


ASSESSMENT OF ACCURACY IN THE GENERAL CASE 


In the general problem it is assumed that both y and z are subject to 
error. To use Wald’s confidence interval method it is assumed further 
that the n errors n = y — Y are independently and normally distributed 
with constant variance o, , similarly the n errors e = x — X are inde- 
pendent and normal with variance o, ; the « and y errors are moreover 
mutually independent, so that the variance of 7 — Beis a, + Ba. 

Consider now possible ‘estimates’ of this last variance when £ is 
known. Ii we write for the total sums of squares and products of 2 and 


y within the three groups 


ee ey eae By) A Pn ey 
eG 9) > o- Ey— 

+ dia @ — %)(y — H) 
8 = ony —nyY + De — oe) + da — He), 


where >.>, denotes summation over the observations in the 7-th group, 
then (S,, — 285., + 6°S..)/(n — 3) is an estimate of the variance 
o + Bo. with n — 3 degrees of freedom. The remaining 3 degrees of 
freedom are contained in the three group means. One is represented by 
the general mean, one by the difference between the means of the first 
and third groups to be used in the estimate of 8; the third is represented 
by the difference between the mean of the second group and the general 
mean of the first and third groups. 

For data with few observations it is advisable to make use of the last 
degree of freedom in the variance estimate, as in the numerical example 
considered later. Alternatively, if it is not so used, it remains available 
for testing the linearity of the true X, Y relation. In the former case, 


Sey 


Il 
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the appropriate square to be added to the numerator of the previous 
estimate is 


(y, + Ys — 292)” — 28G: + Ys — 2y2)(%i + Xs — 222) 


+ 8°@, + — 25°14? +—4 . 

and the estimate s’(8) obtained with n — 2 now as the divisor will have 
n — 2 degrees of freedom. 

Since 

e& ax X31) (b" = B) = (73 ar Bes) — (m = Be,), 

when b’ is given by (2), the left-hand quantity under the assumptions 
made in this section is normal with variance (c, + 6'o.)(2/k). This is 
subject to one qualification, that the errors in the x variable do not influ- 
ence the allocation of the observations to the three groups. Such an 
effect may be neglected in many problems, particularly when the errors 
are small compared with the spacing of the observations at the points 
of division between the three groups; it will not be considered further 
here. A more detailed consideration of this point has been given by 
Wald (1). 

Under the same assumptions we have 


@ — %)(b’ — B)V/4k 

s(8) 
Although the denominator depends on 8, this ¢variate enables a confi- 
dence interval to be obtained for 8. Thus for a value t corresponding to 


any chosen probability value we have the interval determined by the 
quadratic equation for 8, 


(5) (5 4 1) (b" i B)’4k = AG ee 2B8zy =e Bsa) 


where s’(@) = s, — 28s,, + B's? . 

If required, a similar method may be used to provide a joint confi- 
dence region fora and 6. If a = y — £2, then a is independent of the 
numerator of ¢ and of s(@), and hence 


p = tina — a)” + 3k@, = 2)*(' = 6)'} 
s*(8) 
is a variance ratio with degrees of freedom 2, n — 2. For any chosen 
probability value the corresponding critical value of F will determine an 
ellipse as the boundary of the confidence region for a and 8. This may 
be compared with the corresponding region for the least-squares method 


if it is known that of = 0; this region is similarly obtained from the 
variance ratio 


i, = 
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p — zine — a)’ + (Oy > (x — 2} 


s° 


b) 


where s° is the usual variance estimate of y — Y obtained from the residu- 
als of y with n — 2 degrees of freedom. 

If, as suggested earlier in this section, it is desired to examine the 
linearity jot the functional relation, the variance estimate s,_3(8) of 
o, + Boz with n — 3 degrees of freedom must be used. The further 


quantity 
2 ae 
k ay n — +} 


= tH =F Ys Ee: 2Y2) a B(x, ae X3 al a 
Sn—3(B) 


is then (if the linear relation is valid) also a ¢-variate with n — 3 degrees 
of freedom. It will be noticed that it involves the unknown slope 8. 
When this is replaced by the estimate b’, the resulting statistic is no 
longer exactly a t-variate, but might be treated approximately as such, 
especially when x, + x, — 2%, is small compared with z, — 7, . 


NUMERICAL EXAMPLE 


As a numerical example consider fitting a straight line to the data on 
penicillin ‘assay’ given by Davies (8, 8 6.12). Six different concentra- 
tions of pure penicillin were set up on a plate on which an agar medium 
containing B. subtilis had been spread, and the mean circle diameters of 
the zones of inhibition of growth of the organisms were measured (for 
further details of the technique see S 5.41 of (3)). The concentration 
had negligible error, so that the standard least-squares method was 
available, the relation between circle diameter and log. concentration 
being linear. With circle diameter y in mms. and 1 penicillin unit per 
ml. as x = 1, and a two-fold increase in concentration as the unit for the 
x scale, the regression equation of y on x was 


(6) Y = 20403--- 1-782@ — 3.5) = 14.166 +-'1./82 x 


with a 95% confidence interval for the slope, based on the usual ¢-sta- 
tistic, of (1.732, 1.832). 

It is stressed that the data are considered again here purely in order 
to illustrate the present method. The six observations are divided into 


three groups: 
y 15.87 17.78, 19.52 21.35, 23.13 24.77 (Total 122.42) 
< i 2, 3 a, 5 6 (Total 21) 


A OBE Se Be) es (17.78 + 15.87) — 1781. 
te G+5)-@+)) : 
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Hence the estimated relation is 
(7) Y = 20.403 + 1.781(X — 3.5) = 14.170 + USAC, 


The sum of squares within each group has only one degree of freedom in 
this example, and may conveniently be calculated from the difference 
of the two observations per group. The other degree of freedom to be 
added is that for the contrast of the mean for the second group with 
the mean for the other two groups. This gives zero contribution for z, 
and for y 


94.77 + 23.13 + 15.87 + 17.78 — 2(19.52 + 21.35) = —O0.19 
with appropriate divisor. Hence 


aes (1.91) + (1.83)" + (1.64)? 
a 2 


a 


(—0.19)? _ 
a amr sons 


1x 191+1X1883+ 1X 1.64 r OD 019) 


Se 
: 2 he 


= 2.69 


Equation (5), with t = 2.78 for 4 degrees of freedom (P = 0.05), gives 
16(1.781 — B)’ = (2.78)'(4.8463 — 26 X 2.69 + 1.56”)/4 

or 13.10186" — 26(23.2987) + 41.3879 = 0 

or 6 = 1.778 + 0.058. 


Thus the 95% confidence interval for 8 by this method is (1.720, 1.836), 
an interval naturally slightly wider than the interval obtained by the 
least-squares method, since the assumption of no error in x has been 
dropped. 
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RELATIONSHIP OF CATCH TO CHANGES IN POPULATION 
SIZE OF NEW ENGLAND HADDOCK 


By Howarp A. Scuuck 
Aquatic Biologist 


Fish and Wildlife Service 
United States Department of the Interior 


INTRODUCTION 


HE Unirep States catch of haddock has fluctuated considerably 

throughout the years and these fluctuations have generally been of 
a declining nature. In 1929, the catch was about 260 million pounds, 
and in recent years it has averaged only about 150 million pounds. Fluc- 
tuations in the catch have been due in large part to variations in actual 
abundance, or the size of the stock of commercial sizes of haddock on the 
banks in different years. We are, therefore, interested in obtaining an 
accurate measure of the size and the composition of the stock, to measure 
its changes throughout the years, and to determine what factors have 
been most responsible for such changes. Changes in the stock from year 
to year are the result of varying rates of removals and additions. There- 
fore, besides determining the size of the stock in different years, it is 
necessary to measure the yearly removals from the stock by catch and 
natural mortality, and the yearly additions by recruitment and growth. 

If these variables could be measured accurately, we should be in a 
position to evaluate their relative importance in determining the size of 
the stock and to determine whether the size of the spawning stock and 
of other stocks affects recruitment. With such information and other 
general life history facts, it should be possible to determine at what 
level the stock should be maintained, to determine what mode and 
intensity of fishing will result in the maximum sustained production of 
haddock, and to make periodic predictions as to future production of the 


fishery for the industry. 
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A basie equation is: 


S+(@+R+M) - (Cay ees) =S, 
where: 


S = size of population at the beginning of the year. 

S, = size of population at the end of the year. 

G = additions to the population during the year by growth. 

R = additions to the population during the year by recruitment of 
young. 

M = additions due to immigrations. 

C’ = deductions from the population during the year by the fishery. 

N = deductions from the population during the year due to natural 
mortality. 

M, = deductions due to emigrations. 


: NEW 
BRUNSWICK, - °F BO NEWFOUNDLAND 


BANKS 


FIGURE 1, 


LOCATION Ur FISHING BANKS OFF NEW ENGLAND, NOVA SCOTIA. AND 
NEWFOUNDLAND. 


It is believed that the population of haddock inhabiting the New 
England Banks (Georges) (Fig. 1) is largely independent of the popula- 
tions on the Nova Scotian and Newfoundland Banks. Assuming this 
to be true, and if we consider the population on Georges Bank only, 
there will be no important changes in the stock from year to year due to 
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TABLE 1 
RELATIVE SIZE OF THE GEORGES BANK HADDOCK POPULATIONS IN TERMS OF 
THE AVERAGE NUMBERS OF FISH PER DAY TAKEN BY A STANDARD GROUP OF 
OTTER TRAWLERS 


Numbers 
Year per day 
1931 3,032 
1932 4,324 
1933 3,630 
1934 4,049 
1935 4,927 
1936 5 ,590 
1937 4,404 
1938 4,833 
1939 5,502 
1940 4,979 
1941 6,960 
1942 7,941 
1948 7,319 
1944 Dor 
1945 5,347 
1946 4,956 
1947 4,954: 
Average _—i| 5,205 


immigrations or emigrations; and M amd M, can be left out of the 
equation. 

Also, if we consider the population as numbers, rather than pounds 
of fish, G or “growth”, can be left out too. Furthermore, if we define 
the population S as being the number of fish of certain year classes at 
the beginning of a year and S, as the number of fish of the same year 
classes at the end of that year, then there can be no recruitment; and R 
can also be ignored. Thus, the equation for certain purposes can be 
reduced to: 

S— (C+ WN) =S8, 

Available for use in this equation are biological and statistical data 
for the Georges Bank population going back to 1931. These data were 
assembled by the Haddock Investigation of the United States Fish and 
Wildlife Service and its predecessor agency, the United States Bureau of 
Fisheries. 

The remainder of this paper will be devoted to: (1) developing an 
index representing the size of the population in terms of numbers of 
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RELATIVE SIZE OF POPULATION 
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FIGURE 2. 


RELATIVE SIZE OF THE POPULATION, IN TERMS OF THOUSANDS OF FISH PER DAY 
BY YEARS. 


haddock of definite ages and year classes, at the beginning and end of 
yearly periods (S and S,); (2) measuring the fishery removals (C) of 
haddock of each age during each of the 17 years, 1931-47; and (8) 
determining how important the yearly fishery removals are in decreasing 
the stock from the beginning to the end of yearly periods. 


SIZE OF THE STOCK OR-“S” AND “Sy” 


Total catch represents fishing removals and in itself is a vital piece 
of information. It does not, however, represent abundance, or the rela- 
tive size of the population on the Bank, inasmuch as the amount of fish- 
ing effort utilized to make the catch varies among years. 

The index representing the relative size of the population that was 
used in the Haddock Investigation is the average yearly catch per day* 
of a standard group of large otter trawlers which fished out of Boston 


Details of this abundance analysis were developed by W. C. Herrington and G. A. Rounsefell. 
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during this 17-year period. The relative size of the population’ was first 
expressed in terms of the average number of pounds per day taken by 
these trawlers in each year. By the use of yearly average weight data, 
the statistics on relative population size were converted from pounds to 
numbers of fish (Table 1 Fig. 2). 

In each year and season a sample of the haddock that were landed 
had been obtained, and from those fish obtained, scales had been col- 
lected. Then, for each year and season, the ages of these sample fish 
were determined. This determination was made by studying the pro- 
jected impression of these scales. Figure 3 shows a photograph of such 
a microprojection of a scale from a Georges Bank haddock. 

The fish were aged as having completed their first, second, third, 
fourth, fifth, sixth, seventh, eighth, and ninth year of life, and were 
correspondingly classified as fish of 1 to 9 years of age. The category 
of 9-year-olds includes 9-year-old and older fish. (The number of 
haddock of ages greater than 9 years was very small, amounting in the 
aggregate to less than one-half of one percent of all haddock in the 
catch.) 

By using the percentage age composition that had been computed for 
haddock of each length and for each year and season, the total numbers 
of fish caught per day were reduced to numbers per day of each age 
(Table 2 and Fig. 4). The average abundance for all 17 years (Fig. 5) 
amounted to 116 one-year-olds, 1,472 two-year-olds, 1,571 three-year- 
olds, 920 four-year-olds, 557 five-year-olds, 324 six-year-olds, 149 seven- 
year-olds, 61 eight-year-olds and 34 nine-year-old and older fish. It can 
be concluded that the relative abundance of fish in the catch diminishes 
quite regularly for those fish three years old and older. The fact that 
the one- and two-year-old fish are less abundant indicates that these age 
groups are not fully available to the fishery. 

In order to measure the diminution in the stock over the period of a 
year, it was desired to compare the relative population size of the fully- 
available age groups of fish at the beginning of each year with the size 
of the corresponding stock at the end of the year. Table 2 gives the 
average population size for the “haddock” year.’ In order to obtain a 


2Where “‘population size’’ is mentioned in the remainder of this paper it refers to this index of 
relative population size. Although it has not yet been possible to determine the exact relationship 
between the actual number or pounds of fish in the stock and our calculated index of relative population 
size, the index appears to suffice for the purpose used here. 


3The “haddock” year consists of seasons A, B, C, and D as follows: 
A—February, March, and April (spawning season) 
B—May, June, and July 
C—August, September, and October 
D—November, December, and January 
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FIGURE 3. 


PHOTOGRAPH OF A SCALE FROM A HADDOCK THAT HAD JUST COMPLETED ITS 
FOURTH YEAR WHEN CAUGHT APRIL 1939 ON GEORGES BANK. THE MARKS INDI- 


CATE THE COMPLETION OF EACH YEAR OF GROWTH. THE LENGTH OF THIS FISH 
WAS 22 3/4 INCHES. 
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TABLE 2 


RELATIVE POPULATION OF EACH AGE OF GEORGES BANK HADDOCK IN TERMS OF 
NUMBERS CAUGHT PER DAY 


Age in years 

Year | 
All Ages 1 2 3 4 5 6 7 8 9 and 
older 
1931 3,032 147 691 158 439 699 466 256 132 44 
1932 4,324 11 210 2,829 275 413 323 146 74 43 
1933 3,630 44 986 720 1,145 249 193 145 67 81 
1934 4,049 141 966 1,108 690 678 241 125 60 40 
1935 4,927 202 1,704 1,306 574 509 428 97 74 33: 
1936 5,590 157 Li to2 1,834 920 402 236 222 41 26 
1937 4,404 150 1233 15327 698 535 251 119 65 26 
1938 4,833 165 2,590 988 489 234 199 114 31 23 
1939 5,502 95 TS 2,416 640 201 123 108 42 32 
1940 4,979 524 1,116 1,689 1,018 309 184 93 28 18 
1941 6,960 144 3,298 1,275 1,046 752 233 123 40 49 
1942 7,941 94 3,036 2,567 1,037 624 362 158 36 27 
1943 7,319 11 1,026 3,470 1,551 530 492 149 61 29 
1944 5,737 14 135 1,412 2,609 948 416 95 91 ile 
1945 5,347 25 1,663 420 1,244 1,218 485 194 61 37 
1946 4,956 24 856 1,992 400 854 562 PALI 49 2 
1947 4,954 18 1,996 1,189 863 250 314 180 89 55 
Total | 88,484 1,966 | 25,033 | 26,700 | 15,638 9,475 5,508 | 2,541 | 1,041 582 
Avg. 5,205 116 1,472 fog 920 557 324 149 61 34 


value of the population size at the beginning of the year while eliminating 
the effect of the seasonal cycle in availability it was necessary to recom- 
pute these data. 

All data originally had been computed on a seasonal basis: for exam- 
ple, Table 3 shows the seasonal population size data from which Table 2 
was derived. In order to obtain values for the population size that more 
closely represent values at the beginning of each year, the abundance 
values for seasons C, D, A, and B in Table 3 were averaged. In this 
recombination it was necessary to consider that 3-year-old fish in seasons 
C and D become 4-year-old fish in seasons A and B of the following 
year and that other ages progress accordingly. 

For example, to obtain the relative size of the population of 4-year-old 
fish at the beginning of 1935 the following figures’ were used: 


‘It is recognized that by summarizing values for 4 seasons and dividing by 4, the average obtained 
does not under some conditions represent the average of the midpoint and thus the exact beginning of 


the year. For the purpose of this analysis, however, such a calculation represents the beginning of the 


year accurately enough. 
5An exception to this rule was made in computing the relative population of 9-year-old haddock 


at the beginning of the year. Since this group includes all older haddock, 8-year-old and 9-year-old 
haddock from seasons C and D were added to 9-year-old haddock from seasons A and B and the total 
of these 6 figures, instead of the usual 4, was divided by 4 to give the average, 
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RELATIVE SIZE OF POPULATION OF EACH AGE 
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FIGURE 4. 
RELATIVE SIZE OF HADDOCK POPULATION OF AGES 1-9, FOR EACH OF THE 17 YEARS. 


Number of fish 


per day 

3-year-old haddock, season C, 19847... . ..... 1,425 
3-year-old haddock, season D, 1984 ........ 431 
4-year-old haddock, season A, 1985 ........ 583 
4-year-old haddock, season B, 1985 ........ 889 
Total. set agian ya eget aa eae OS 
Average jit soe. pau eae ns cee ene 832 


Using this system the relative sizes of the population of 4- to 9-year- 
old fish at the beginning of each year were computed and are shown in 
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AVERAGE ABUNDANCE OF EACH AGE, ALL YEARS 
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FIGURE 5. 
RELATIVE POPULATION SIZE OF HADDOCK OF AGES 1-9. AVERAGE OF ALL 17 YEARS. 


Table 4. Since computation of the catch-per-day of 3-year-old fish at 
the beginning of the year involved use of figures for the less available 
2-year-old fish in seasons C and D, it was decided to omit the 3-year 
group. 

The next step was to decide whether to consider the age groups sepa- 
rately or in the aggregate. Examination of the data in Table 3 indicated 
that the decrease in catch-per-day for individual year classes from year 
to year was rather variable, hence, it was desirable to combine age groups. 
Thus the total of all fish of ages 4 to 9 years for the beginning of each 
year are shown in the right-hand column of Table 4. 

It was next necessary to compute the size of the population at the 
end, in addition to at the beginning, of each year. The average popula- 
tion at the beginning of the year or seasons (C + D + A + B)/4, 
approximates the value of the midpoint between D and A. Therefore, 
values for the number of fish at the beginning of the year are the same as 
values for the number at the end of the preceding year. 

For example, from Table 4, if there are 1,793 five-year-old fish per 
day at the beginning of 1945, then there are 914 (the number of 6-year- 
olds at the beginning of 1946) 5-year-olds at the end of 1945. 
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TABLE 3 


RELATIVE SIZE OF POPULATION OF EACH AGE OF GEORGES BANK HADDOCK 
BY SEASONS IN NUMBERS CAUGHT PER DAY 


Age in years 
Year | Season = 
No. all 1 2 3 4 5 6 if 8 9 and 
years older 
1931 A 3,268 30 193 560 | 1,088 781 372 55 89 
B 3,182 81 70 770 | 1,139 630 336 37 
Gi 2,562 fan's 897 186 245 348 353 269 224 40 
D 3,114 587 LoD) 183 179 222 99 45 2 12 
1932 A 4,281 11 | 2,418 489 707 387 143 80 46 
B 4,937 38 3,030 359 410 351 275 95 76 
G 5,848 3 430 | 4,253 149 343 458 96 9 25 
D Dees 41 SOL 2 3L0 103 191 96 70 27 
1933 A 3,697 ss 112 254 | 1,728 411 423 324 183 262 
B 4,349 tas 1,318 494 | 1,814 277 185 161 56 44 
(E! 4,487 39 | 1,724 | 1,461 875 207 93 63 il7/ 8 
D 1,988 138 789 671 164 100 72 31 12 11 
1934 A 3,729 sats 4 | 1,360 471 874 574 198 150 98 
B 4,299 290 | 1,217 | 1,368 866 209 252 41 56 
G 4,619 isn 1,929 | 1,425 544 502 148 33 37 1 
D 3,549 565 | 1,640 431 377 472 33 ily 10 4 
1935 A 3,215 23 884 583 748 607 185 86 99 
B 5,536 i salaliz( |i) als calle) 889 740 88 52 131 4 
¢ 5,495 16 | 2,443 | 1,848 590 367 152 67 9 3 
D 5,462 791 | 3,285 773 234 179 71 84 71 24 
1936 A 5,827 267 | 2,078 | 1,688 903 397 356 47 91 
B 6 Pale Bolle OOM menus 997 517, 341 377 103 
Gs Grid 93 | 3,618 | 1,537 692 41 69 110 2 9 
D 3,143 582) | 1,361 603 303 145 138 45 ial 5 
1937 A 5 , 224 il 423 | 1,810 | 1,167 988 500 176 137 22 
4,969 se 1,068 ,858 949 552 282 161 54 45 
G 5) les) SOL 2 avo | e2Gp) 439 243 24 51 21 14 
D 2,247 237 685 375 237 356 196 90 46 25 
1938 A 3,078 363 | 1,015 694 341 390 159 56 60 
B 4,736 ‘ PASS) |) al\ sk 614 288 270 178 13 id 
(GF 7,425 an 5,500 | 1,241 384 162 54 52 26 6 
D 4,092 660 | 2,363 463 264 144 81 68 28 eal 
1939 A 4,463 bi 287 | 2,260 873 515 191 179 66 92 
B 6,629 cr 1,604 | 3,371 958 251 223 143 66 13 
G 6,848 30 | 3,057 | 3,157 424 148 16 13 3 
D 4,066 349 | 2,150 877 307 174 56 96 34 "23 


Also, the aggregate population of 4-year-old and older fish at the 
beginning of a year would produce survivors at the end of the year 
which would amount to the number of 5-year and older fish at the begin- 
ning of the next year. For example, if the total population of 4-year-old 
and older fish at the beginning of 1944 was the total of 3,188 four-year- 
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TABLE 3—Continued 


Ages in years 
Year | Season | 
No. all 1 2 3 + 3) 6 if 8 9 and 
yous older 
1940 A 2,805 he 127 | 1,057 |} 1,046 337 141 62 24 21 
B 6,245 Pe 1,200 | 2,449 UG YA 509 234 268 42 22 
Cc 5,638 224 | 2,175 || 1,832 922 161 289 10 23 5 
D 5,228 | 1,876 961 1,415 582 230 (es) 43 24 24 
1941 A 5,855 i 1,463 LBD 1,289 | 1,003 181 135 22 26 
B 7,692 ae 2,165 |} 1,732 | 1,639" | 15222 497 209 96 132 
w 9,082 128 | 6,498 | 1,075 757 381 123 81 21 18 
D 5,210 448 | 3,068 557 498 403 131 66 19 20 
1942 A 5,863 neha 290 | 2,599 | 1,083 | 1,006 598 224 35 28 
B 8,769 ae, 1,636 | 3,780 | 1,674 858 444 244 74 59 
C 9,058 39 | 4,875 | 2,898 703 PH 232 93 3 3 
D 8,074 336 | 5,344 989 688 421 73 pill 19 
1943 A 7,067 aioe 116 | 3,282 | 2,069 688 668 102 114 28 
B 8,247 oe 380 | 4,048 | 1,844 831 689 287 97 71 
a4 7,316 isle 1,892 3,353 1,277 429 291 59 vi 8 
D 6 , 647 45>.°1,716 | 3,294 | 1,013 173 320 146 29 aah 
1944 A 7,107 Rats 1} 1,207 | 3,622 | 1,223 680 158 202 16 
B 6,029 35 | 1,396 | 2,586 | 1,385 448 64 81 34 
¢ 6,792 tts 309 | 2,362 3,055 730 224 78 34 
D 3,020 54 193 681 ae: 458 312 80 48 20 
1945 A 4,687 chs 33 499 | 1,023 1,851 762 351 138 30 
B 5,640 ae 1,555 374 1,862 | 1,097 469 193 12 78 
C 7,293 ot | 3,277 668 1 502 1 sea 462 162 BA 21 
D 3,769 67 | 1,783 138 589 792 248 69 58 25 
1946 A 4,768 site 45 | 1,940 487 | 1,238 804 140 113 1 
B 6,630 sae 735 | 3,040 333 1,059 928 446 80 9 
C 4,403 45 | 1,306} 1,723 310 535 268 215 vans i 
D 4,024 52 1,334 | 1,270 471 582 246 69 
1947 A 4,382 ae 58 | 1,258 | 1,628 394 507 293 149 100 
B 3,589 mar 1,052 | 1,015 600 264 337 75 94 52 
Cc 8,122 1 | 4,904 1,831 782 169 253 115 41 26 
D Biri 73 | 1,969 658 442 171 159 136 71 42 
Avg. A 4,666 ees 215 | 1,021 1,206 842 505 209 103 65 
all B 5,806 con 1,069 | 2,015 | 1,223 Vou 436 225 74 43 
years iC: 6,255 59 | 2,799 | 1,889 805 359 206 92 35 11 
D 4,093 403 | 1,807 858 448 307 147 72 33 18 


olds, 1,224 five-year-olds, 432 six-year-olds, 208 seven-year-olds, 122 
eight-year-olds, and 26 nine-year-old and older fish, or a total of 5,200; 
then the survivors from this group of year classes, after an interval of 
one year, would be the number of 5- to 9-year-old and older fish at the 
beginning of 1945, or 1,793 five-year-olds, 604 six-year-olds, 270 seven- 
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TABLE 4 
SIZE OF HADDOCK POPULATION AT BEGINNING OF EACH YEAR! 


Age in years 

Year 

4 pens) 6 7 8 9 and Total 
older 

1932 304 385 327 218 122 108 1,464 
1933 2,276 235 286 260 101 120 3,278 
1934 993 695 272 154 71 51 2 ,236 
1935 832 602 616 104 67 39 2,260 
1936 1,326 561 321 239 75 50 PR ADT 
1937 1,064 634 242 136 86 24 2,186 
1938 T3t 326 315 139 52 43 1,612 
1939 884 354 180 114 63 46 1,641 
1940 1,650 394 174 98 44 26 2 ,386 
1941 1,544 932 267 176 43 58 3,020 
1942 1,097 780 456 181 64 42 2 ,620 
1943 1,950 728 498 198 94 39 3,507 
1944 3,188 1,224 432 208 122 26 5 ,200 
1945 1,482 1,793 604 270 77 52 4,278 
1946 406 1,097 914 324 106 37 2,884 
1947 1,305 360 490 246 132 38 2,071 
Total | 21,038 11,100 6,394 3,065 1,319 799 43,715 
Average} 1,315 694 400 192 82 50 2, (33 


1Values are the average of the number of fish of the particular age from Seasons A and B of the 
year in question, and of the number of fish of 1 year younger from Seasons C and D of the preceding year. 


year-olds, 77 eight-year-olds, and 52 nine-year-old and older fish, or a 
total of 4,278. 

Having already obtained the total number of 4-year-old and older 
fish at the beginning of the year (from Table 4), and having now com- 
puted the total number of 4-year-old and older fish at the end of each 
year (the number of 5-year-old and older at the beginning of the next 
year), all such totals were entered in Table 5. 

Computation of the yearly diminution of the stocks being measured 
was then only a matter of subtracting the value representing the stock 
at the end of the year from the value representing the stock at the begin- 
ning of each year. 


THE FISHERY REMOVALS OR “‘C” 


A measure of the yearly decreases in population size of completely 
available fish from year to year thus had been obtained. Inasmuch as 
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TABLE 5 


RELATIVE SIZE OF POPULATION OF CERTAIN AGES OF HADDOCK AT THE 
BEGINNING AND END OF EACH 15 YEARS 


Number of 4- Number of 4- 
to 9-year-olds to 9-year-olds 


Year at beginning at end Decrease 
of year of year; 
1932 1,464 1,002 462 
1933 3,278 1,243 2,035 
1934 2,236 1,428 808 
1935 2,260 1,246 1,014 
1936 2,572 1,122 1,450 
1937 2,186 875 1,311 
1938 1,612 15 855 
1939 1,641 736 905 
1940 2,386 1,476 910 
1941 3,020 1,525 1,495 
1942 2 ,620 1,557 1,063 
1943 3,507 2,012 1,495 
1944 5 ,200 2,796 2,404 
1945 | 4,278 2,478 1,800 
1946 2 ,884 1,266 1,618 
Total 41,144 21,519 19 ,625 
Average 2,143 1,435 1,308 


1End of year = number of 5- to 9-year-olds at beginning of following year. 


the purpose of this study was to determine to what extent such decreases 
were associated with, or were the result of, the removals by the fishery, 
it was necessary next to determine how many fish the fishery had taken 
from the population in the various years. 

The fishery removals for the years 1931—-47° were first tabulated in 
terms of pounds of fish. Having also the average weights of these fish 
that were landed, the total numbers caught were easily computed. The 
total pounds and numbers are shown in Table 6, and the total numbers 
in Figure 6. 

The numbers caught were then reduced to numbers of each age by 
utilizing the percentage-age compositions referred to earlier. After 
summarizing by size groups and season, the number of fish of each age 
removed by the fishery in each of the 17 years is shown in Table 7. 
a ee a eee 

6The landings for the ports of Boston, Gloucester, New Bedford, Mass., and Portland, Me. 
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TABLE 6 
TOTAL CATCH OF HADDOCK FROM NEW ENGLAND BANKS 

Millions Millions 

Year of pounds of fish 
1931 101.801 34.979 
1932 86.706 32.348 
1933 70.272 26.623 
1934 39.683 SOT 
1935 : 68.579 28.565 
1936 73.496 31.489 
1937 83.973 32.528 
1938 80. 202 33.570 
1939 91.181 38.911 
1940 81.676 31.345 
1941 111.611 46.944 
1942 97.786 41.299 
1943 80.215 33.036 
1944 84.265 29.062 
1945 65.284 22.091 
1946 90.802 32.678 
1947 98 .082 38.931 
Total 1,405.614 550.016 
Average 82.683 32.354 


DECLINE IN THE SIZE OF STOCK AS ASSOCIATED WITH 
VARIATIONS IN THE CATCH. 


In an earlier section of this paper the yearly declines in the relative 
size of the stocks of those ages of haddock that were fully available to 
the fishery were computed (Table 5). In the section just completed, the 
catch of fish of each age in each year has been computed (summarized in 
Table 7). By summing the catches of fish of 4 to 9 years of age inclusive, 
in each year, the numbers of fish that were removed from the corre- 
sponding stock between the beginning and the end of the years involved 
were computed. Thus, in Table 8 data are presented which represent: 


(1) the decrease in the relative size of the stocks of 4- to 9-year-old 
fish from the beginning to the end of each of the 15 years (1932- 
46) in thousands of fish per day, and 

(2) the number of fish removed by the fishery from these stocks 
during each of these 15 yearly intervals, in millions. 
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REMOVALS FROM THE GEORGES BANK POPULATION 
1931 — 1947 
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FIGURE 6. 
CATCH OF GEORGES BANK HADDOCK IN TERMS OF NUMBERS OF FISH. 


A casual observation of this table shows, in general, that in years 
during which large numbers of fish were taken from the Georges Bank 
population, there were also large declines in the population size from the 
beginning to the end of the year, and that in years during which small 
numbers were removed, the population changed but little. 

These data have been plotted in Figure 7, with the ‘removal’ or 
“catch (C)” as the independent variable, and with the decline, the 
change in population size from the beginning to the end of the year, as 
the dependent variable. This figure is plotted in a rather unusual man- 
ner, with values of the dependent variable being plotted below, rather 
than above the origin. This has been done inasmuch as values of the 
dependent variable (change in population size) are actually decreases 
rather than increases, and it has been found that this method of plotting 
is more easily interpreted by some people. 
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TABLE 7 
AGE COMPOSITION OF CATCH, BY YEARS, IN MILLIONS OF FISH 
Age in years 
Year Total 
1 2 3 4 5 6 i 8 9 and 
older 
1931 1.661 8.167 1.802 | 5.089 | 7.975 | 5.291 | 2.949 | 1.555 .490 34.979 
1932 .099 1.712 | 21.139 | 2.008 | 3.051 | 2.383 | 1.079 553 .3824 32.348 
1933 .210 7.366 5.218 | 8.648 | 1.791 | 1.346 | 1.030 464 .550 26 .623 
1934 296 3.807 4.470 | 2.889 | 2.518 .825 482 .197 .133 15.617 
1935 1.144 | 11.096 7.803 | 3.138 | 2.467 | 2.053 415 .360 .089 28 .565 
1936 -828 | 11.449 |. 10.171 | 4.629 | 1.803 | 1.153 | 1.140 217 .099 31.489 
1937 1.193 | 10.129 9.715 | 4.890 | 3.574 | 1.608 .815 .416 .188 32.358 
1938 .961 | 18.453 6.866 | 3.304 | 1.568 | 1.312 .765 .198 .143 33.570 
1939 .565 | 12.806 | 17.379 | 4.383 | 1.807 804 695 272 .200 38.911 
1940 1.895 6.692 | 11.061 | 7.261 | 2.188 | 1.286 653 .191 abil 31.345 
1941 .697 | 21.404 9.026 | 7.389 | 5.303 | 1.632 .860 .280 -353 46 .494 
1942 .290 | 13.106 | 14.877 | 5.938 | 3.648 | 2.131 941 .205 .162 41.299 
1943 016 3.653! | 15.659 | 7.423 | 2.742 || 2.385 .688 sail} .157 33 .036 
1944 054 .675 7.410 13.047 | 4.945 | 1.991 435 412 .093 29 .062 
1945 101 7.046 1.698 | 5.285 | 4.862 | 1.941 .766 .223 .169 22.091 
1946 oakayil 6.709 | 13.251 | 2.406 | 5.000 | 3.289 | 1.589 224 .019 32.678 
1947 .O88 | 15.547 9.604 | 6.660 | 1.979 | 2.542 | 1.397 690 424 38.931 
Total 10.289 |159.817 |167.149 |94.388 |57.221 |33.972 |16.700 | 6.770 | 3.710 | 550.016 
Average 605 9.402 9.833 | 5.552 | 3.366 | 1.998 .982 .3898 .218 32.354 


The straight line in Figure 7 was fitted to the data by the method of 
least squares and has the equation, 
Y = —,.022 + .1185X where 
X = millions of haddock of ages 4-9 years removed from the stock in 
each of 15 years by the fishery. 
Y = decrease in relative population size of 4- to 9-year-old haddock 
during each of these 15 years in thousands of fish per day. 


The coefficient of correlation measuring the degree of association 
between these two variables is 0.81. With 13 degrees of freedom this 
values proves to be highly significant (1 per cent level = 0.64). R? is 
about 0.66. Thus, it seems valid to conclude (under the assumption 
that the straight line best fits these data) that about 66 per cent of the 
variability in yearly decreases in population size, from the beginning 
to the end of the individual years, is explainable by the variations in 
the numbers of fish actually removed from the stock by the fishery. 
This value of 66 per cent is possibly a minimum estimate of the effect 
of the fishery, masmuch as the index of abundance is probably not 
perfectly correlated with actual abundance. 

No attempt was made in this treatment to determine whether some 
curved line fitted these data better than this straight line. 
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TABLE 8 
DECREASE IN SIZE OF STOCK OF 4- TO 9-YEAR-OLD HADDOCK FROM 
THE BEGINNING TO THE END OF YEARS 1932-46, AND THE TOTAL 
CATCH OF THESE AGES IN EACH YEAR 


Decrease in stock Catch in 

Year | thousands of fish millions 

| per day of fish 

1932 .462 9.398 
1933 2.035 13.829 
1934 .808 7.044 
1935 1.014 8.522 
1936 1.450 9.041 
1937 Mee lall 11.491 
1938 . 855 7.290 
1939 .905 8.161 
1940 .910 11.697 
1941 1.495 iss hiley 
1942 1.063 13.026 
1943 1.495 13.708 
1944 2.404 20.923 
1945 1.800 13.246 
1946 1.618 12.527 
Total 19.625 175.720 
Average 1.308 11.715 


This line was arbitrarily extrapolated beyond the limits of the data 
towards the origin, although admittedly the exact position of the line 
where the removals are very small is unknown. It can be seen from 
Figure 7 that the intercept is practically at the 0.0 point. The position 
of this intercept, assuming that the population index actually represents 
the size of the population, poses interesting theoretical possibilities. 

First of all, the suggestion is raised that within the ranges of popula- 
tion size and fishing removals represented by these data, the losses due 
to factors other than the fishing removals, i.e., to natural mortality, 
may be negligible. Such a possibility could theoretically be true under 
the conditions of an intensive fishery, where fishing removals would take 
many fish which would otherwise be removed by natural causes. When 
one takes into consideration the relative lack of bottom dwelling preda- 
tors on Georges Bank that are large enough to consume haddock of 
2-10 pounds and the apparent lack of any disease epidemic or serious 
parasitism in haddock over the 17-year period, this possibility does not 
appear quite so improbable. 
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DECREASE IN POPULATION SIZE 
AS AFFECTED BY CATCH 


REMOVALS BY THE FISHERY 
MILLIONS OF FISH 


ce) 5 10 IS 20 25 


DECREASED POPULATION SIZE 
THOUSANDS OF FISH PER DAY 


FIGURE 7. 
THE RELATIONSHIP BETWEEN THE YEARLY REMOVALS IN MILLIONS OF HADDOCK 
AND THE DECREASE IN RELATIVE POPULATION SIZE FROM BEGINNING TO THE 
END OF THESE SAME YEARLY PERIODS IN TERMS OF THOUSANDS OF FISH PER DAY. 


Secondly, the exact position of the line with populations of this 
general size, if the removals were greatly reduced and even became zero, 
is unknown. If the extrapolation, as in Figure 7, happens to be an accu- 
rate representation of this relationship, then one would conclude that 
with no fishing removals, as would occur if fishing were to suddenly cease, 
there would be no decrease in the stock and thus no natural mortality. 
Theoretically, however, if fishing were to be considerably reduced sud- 
denly, natural mortality would probably be greater than at present 
because some of the fish now being caught would be vulnerable to what- 
ever causes of natural mortality are in operation. With populations of 
present levels but with very small fishing removals, the line would 
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possibly curve toward the Y axis and intersect it at some point greater 
than 0. 

The data and the ideas expressed here refer only to a heavily fished 
population and not to the relatively unfished populations of early days, 
or to the populations which would result if the sudden cessation of fish- 
ing were to continue for several years. In such populations, natural 
mortality would probably be greater yet, for such reasons as poorer 
nutrition of the larger stock, greater average age resulting in more 
deaths from senility, and so on. 

This general situation is to be studied by various lines of approach in 
future studies. From the present study, however, we may conclude that 
the number of haddock caught in various years by the New England 
fleet markedly affected the subsequent population of haddock of corre- 
sponding ages on Georges Bank. Although it is generally assumed in 
many fisheries that the fishery does affect the stock, instances where such 
an effect has been demonstrated clearly are extremely rare. This 
analysis, in addition to demonstrating this relationship, is also of con- 
siderable value in providing the basic data that can be used in determin- 
ing many other very important facts necessary for a broad understanding 
of the biometrics of the valuable New England haddock resource. Such 
facts include the actual number of fish present on the bank, fishing and 
natural mortality rates, growth rates, indices of the recruitment of 
young, the effect of various factors upon recruitment, and predictions 
as to the future abundance of this species. Investigations of these re- 
lationships are now being undertaken and will be reported upon soon. 
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ONE DEGREE OF FREEDOM FOR NON-ADDITIVITY*t 


JouHn W. TUKEY 


Princeton University 


INTRODUCTION 


ip DISCUSSING the possible shortcomings of the analysis of variance, 
much attention has been paid to non-constancy and non-normality of 
the ‘‘error” contribution. (The recent papers in Bzometrics by Eisenhart 
[4], Cochran [3] and Bartlett [1] discuss these matters and give refer- 
ences.) The present writer is usually much more concerned with and 
worried about non-additivity, and until recently has suffered from the 
lack of a systematic way to seek it out, and then to measure it. (Con- 
versations with Frederick F. Stephan have contributed greatly to this 
development and presentation. ) 

The purpose of the present paper is to indicate such a way, when the 
data is in the form of a row-by-column table. (The professional practi- 
tioner of the analysis of variance will have no difficulty in extending the 
process to more complex designs.) We shall show how to isolate one 
degree of freedom from the “residue’’, “error’’, “interaction” or ‘‘dis- 
crepance’’, call it what you will. There are two known situations to 
which this single degree of freedom is expected to react by swelling: 


(1) when one or more observations are unusually discrepant; 
(2) when the analysis has been conducted in terms where 
the effects of rows and columns are not additive. 


The first situation is quite familiar and requires little explanation. The 
second occurs often enough, but may not be noticed. An example may 
help to fix the ideas. 

Let us construct an artificial example with 3 rows and 4 columns, 
with each entry contributed to overall, by rows, by columns, and by 
cells. Suppose that these contributions are as follows: 


*Prepared in connection with research sponsored by the Office of Naval Research. 
{Presented to the Biometrics Section and the Biometric Society at Cleveland, December 29, 1948. 
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in general by rows by columns by cells 

a ial 4 4 4 4 6 1 —4 0 i =2 1 0 

Ly Al —3 —3 -3 -—3 Goo — 4) a = 2 —3 
i ea —o 3 =o —3s 6 1 —4 0 Q =2 i 0 


Then the tables and corresponding analyses for the sum of all contribu- 
tions are: 


TABLE 1 
ILLUSTRATIVE EXAMPLE IN ORIGINAL TERMS 


Values and Means Analysis of Variance 
12 4 2 9) 23 5.8 DF SS MS 
4 —2 -4 -5 |-7 =1.8 
4 -3 -7 -2 |-8 —2.0 
Rows 2 140 70 
Sums | 20 —l -9 —2 8 Columns 3 Iai 52 
Means | 6.7 —0.3—3.0—0.7/| OFT Rae © 6 26 4 


Now let us square the entries and divide by 10, rounding to integers. 
The resulting tables and analyses are: 


TABLE 2 
ILLUSTRATIVE EXAMPLE IN TERMS OF SQUARES 


Values and Means Analysis of Variance 
14 2 1 2 19 4.8 DF SS MS 
2 O 2 2 6 125 
5 AN 5 0 8 AAV 
Rows 2 24.5 12.2 
Sums | 18 3 8 4 33 Columns rs 46.9 15.6 
Means| 6.0 1.0 2.7 1.3 2.8 RXxKC 6 84.8 14.1 


Notice that all semblance of row or column effects have now van- 
ished, although Table 1 showed large and significant effects. The use 
of the squared scale has concealed the real effects. (It may be argued 
that squaring numbers which range from plus to minus is unrealistic. 
The answer is that this 7s an extreme example, but one that can be slowly 
and smoothly changed into a very mild one. There probably is a differ- 
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ence in degree between this example and what happens in practice, but 
there is no difference in kind.) 


PROCEDURE 


How then do we isolate the single degree of freedom? The process 
is simple, and runs as follows: 


(A) To the row-by-column table, already bordered with sums and means, 
add a new border of deviations of means from the grand mean 
(decimal places may be reduced, but the sums of deviations, by 
rows and by columns must be forced to vanish). 

(B) Add an extra column (or row) and enter in each cell the sum of 
products of the deviations by columns and the entries in its row 
(or column). 

(C) Accumulate the sum of products between the deviations of row 
(or column) means and the new entries of (B). 

(D) Calculate the sum of squares of deviations by columns and by rows. 

(E) Divide the square of the number from (C) by the product of the 
numbers from (D). This is the mean square (and also the sum of 
squares) for the single degree of freedom. 


The process is illustrated on the same example below: 


TABLE 3 
SAMPLE CALCULATION 


Devia- | Sums of 
Sums Means tions |x-products 


14 2; 1 2 19 AMO 2.0 38.4 

2, 0 2 2 6 L250 2 3u6 

2 il o 0 8 2.00 —0.8 4.6 
Sums 18 3 8 4 33 a OO 68.8 
Means 6.00 OO Bor it 88 Pan Ths) 6.08 


Deviations| 3.2 —1.8 0.0 —1.4] 0.0 15.44 | 50.9 


(B): 148.2) + 2(-1.8) + 1(0.0) + 2(—1.4) = 38.4 
2(3. 2) 0( = 108) 20 Oye et em 
2(3.2) + 1(-1.8) + 5(0.0) +0(-1.4) = 4.6 
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(C): 38.4(2.0) + 3.6(—1.2) + 4.6(—0.8) = 68.8 
(D): (8.2) + (—1.8)? + (—0.0)? + (1.4)? = 15.44 
(2.0)? + (—1.2)? + (—0.8)? = 6.08 


(68.8)° 
(15.44)(6.08) ~ ee 
Assigning the mean square 50.9 to the degree of freedom for non-addi- 
tivity, which is subtracted from ‘‘R X C”’, the analysis of variance of 
Table 2 becomes: 


Rows 2 24.5 12 
Columns 3 46.9 15.6 
Non-additivity 1 50.9 50.9 
Balance D 33.9 6.8 


Thus the obvious thing about the illustrative example was its non-addi- 
tivity. The corresponding F value of 7.3 on 1 and 5 degrees of freedom 
is significant at the 5% level. 


EXPLANATION 


We have explained what we are looking for—non-additivity—and 
how to look—last section—but we have not explained what we are really 
doing. This we shall now try to do. Those experienced with single 
degrees of freedom may have already recognized the computation as a 
short-cut method of eliminating the single degree of freedom labeled by 


6.40 —3.60 0.00 —2.80 2.0 
—3.84 2.16 0.00 I aCetsie|| Se ||| ahr | hey liege) (Wat 
—2.56 1.44 0.00 iW =().,'8 


where 6.40 = (2.0)(3.2), —3.60 = (—1.8)(2.0), 2.16 = (—1.8)(—1.2) 
and soon. We have used the products of the deviations of the row means 
and the deviations of the column means to label this single degree of 
freedom. Since the sum of each column and of each row is zero, this 
degree of freedom is orthogonal to rows and to columns. It must be a 
part of “R X C”. This is what we did, but why? 
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Let us take a special case, where there are row contributions, and 
column contributions, and nothing else. We start with perfect additivity. 
If x, is the column contribution (where 7 goes from 1 to c, the number of 
columns), and if y; is the row contribution (where 7 goes from 1 tor, the 
number of rows), then the 2j entry in the table is 


Gi; = Os + Yi - 


Now let us start to analyze a slightly nonlinear function of the ai; . 
Instead of a;; , consider 


frlais) = Gig + MGs a) 


where \ is a small constant, and a is, for convenience, the average % + Y 
of all the a;;. We find that we can write 


fais) = [es + Mas — 3) + [ys + Cys — DI + AW — DY — Y).- 


The first two terms depend, respectively, on the column alone and on the 
row alone, so the last one contains all the non-additive effect due to 
analysis in terms of f(a) instead of in terms of a. Notice that this non- 
additive effect is a multiple of 


CH Cle Oi 
This means that it occurs in a single degree of freedom, which is identified 
in berms Ol t..— 2 and 2), = Vy. 
We assumed no error of measurement, or the like, and we wrote 
a;; = x; + y; without an additional term. This means that the differ- 
ence between the 7-th column mean and the grand mean is 


(wy — @) +P Xe 12)” = (2, — 27} 


which is nearly x; — % when dis small. Thus a satisfactory approxima- 
tion to the single degree of freedom we want is that indicated by the 
coefficients 


(column mean — grand mean)(row mean — grand mean). 


This is exact for the combination of no error and a very slight change 
from a to f(a), that is for no error and d small. This fact plus empirical 
tests seems enough to warrant recommending general use of this single 
degree of freedom as a test of non-additivity. 


WHAT OF SIGNIFICANCE? 


Suppose that the test shows statistically significant evidence of 
non-linearity—what then? The simplest and laziest thing to do would 
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be to forget the degree of freedom for non-additivity and go on and use 
the mean square for the balance in considering for example, the signifi- 
cance of the row effects. This is not recommended, for the following 
reasons: 


(1) In general, results expressed in terms in which effects are 
additive apply in a broader region and are practically 
more useful. 

(2) If the “error” or fluctuating contribution is not normally 
distributed, then it is not known whether or not the use 
of the balance mean square unduly inflates the apparent 
significance of other mean squares (for the case of a nor- 
mally distributed fluctuating contribution there is no 
distortion of significance.) 


For these reasons, the occurrence of a large non-additivity mean square 
should lead to consideration of a transformation followed by a new 
analysis of the transformed variable. 


This consideration should include two steps: 


(a) inquiry whether the non-additivity was due to analysis 
in the wrong form or to one or more unusually discrepant 
values; 

(b) in case no unusually discrepant values are found or indi- 
cated, inquiry into how much of a transformation is 
needed to restore additivity. . 


The decision under (a) will depend on an examination of the data and 
all the background information available in the field—in particular the 
result of similar inspections of other experiments for non-additivity. 
What seems to be the best way of inspecting the results of a single experi- 
ment so far proposed is to plot the entries in the new column (of sums of 
cross-products) against the corresponding row means. A single unusu- 
ally discrepant observation will tend to be reflected by one point high 
or low and the others distributed around a nearly horizontal regression 
line. An analysis in the wrong terms will tend to be reflected by a 
slanting regression line. 


The figure shows such a plot, including 2s limits, for 
(A) the illustrative example worked above, 
(B) Youden and Beale’s data [6] as simplified by Snedecor 


[5, p. 44], 
(C) Beall’s experiment VI [2] on insect infestation, with plots 
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GRAPHICAL ANALYSIS OF NONADDITIVITY 


(Ordinates are Sums of Cross Products, Dashed Lines are 2S Limits) 


A—ILLUSTRATIVE 


MEAN OF ROW 


B—YOUDEN & BEALE 


MEAN NUMBER OF LESIONS PER LEAF 


treated alike combined (analyzed in terms of numbers of 
insects). 


(D) Cochran’s example [3] of an obviously discrepant value. 
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C—BEALL 


200 


150 


100 


50 


O ) 10 1S 20 29 


MEAN NUMBER OF INSECTS PER PLOT-PAIR 


D—COCHRAN 


.020 


O15 


.O10 


.005 


700 720 740 760 780 .800 .820 
MEAN RATIO OF DRY TO WET GRAIN 
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The limits are set by the formula 


a 
( average ) ( sum of squares of Nive sae 
cross product deviations of column means/ \ for balance 


For the illustrative example (Case A), this becomes 


15.53 + 2 (15.44)?(6.8)? = 15.5 + 20.5 = —5.0 and +36.0. 


In every one of the four cases, the plotted points could be accounted 
for by non-additivity due to analysis in incorrect terms. Cases A and D 
can also be accounted for by a discrepant point. This suggests that it 
will be hard to make this distinction for single experiments on this scale. 
When several small experiments are available for analysis, agreement in 
signs of the slopes of the graphs or equivalently, the signs of the sums 
obtained in Step C may show up analysis in incorrect terms. 

Why does the graph fail to decide about Cases A and D? The reason 
is simple—either explanation is plausible. If in Case A we alter the 
upper left-hand entry from 14 to 2, the analysis of variance becomes: 


DF SS MS 


Rows 2 O25 0.2 
Columns 3 4.9 iL 
Non-additivity 1 0.2 0.2 
Balance 5 12.6 2.5 


Thus we see that our illustrative table of 3 X 4 entries could have per- 
fectly well come from an additive situation where exactly one entry has 
been seriously disturbed. 

Similarly in Case D, taken from Cochran’s paper, if a nonlinear 
function is chosen so that 


Y, 704 <ys= .792, 
gly) = 
800, y = 1.035, 


then. his table is converted into one where the F-ratio for non-additivity 
against balance is 0.8 instead of 27.6. We know that this table arose 
from an error in computation, but it cowld equally well have come from 
an additive table analyzed in the wrong terms. 

In each case, the graphical solution has gone as far as it reasonably 
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could in assigning responsibility for the non-additivity. While the 
graphical analysis is not certain to settle Step (a), it may be expected 
to be a big help. 


AID IN CHOOSING A TRANSFORMATION 


If it has been decided that the wrong terms had been used, then the 
actual size of the mean square for non-additivity must be useful for 
choosing an appropriate transformation. We lack experience with the 
more delicate use of such information, so that it seems appropriate to 
stop here with the following table which shows the connection between 
the szgn of the final sum of products (which was +68.8 in the illustrative 
example) and the type of transformation which may then be appropri- 
ate. 


TABLE 4 


SIGN OF FINAL SUM OF PRODUCTS WHEN CERTAIN TRANSFORMATIONS 
ARE APPROPRIATE (VALUES OF z OR x + a NON-NEGATIVE) 


Transformed 
| values which Conditions Sign when z Important 
are additive* needed is analyzed special cases 
0<p<i zs V2, Vx +1 
LP Or 
pot 0 (x) 
(z + a)P 
ee yy = x, x8 
log (x + a) (none) — log x, log (1 + a) 
} 


*Multiplication by a fixed constant and addition or subtraction of a fixed constant freely possible 


While the removal of non-additivity by transformation usually tends 
to stabilize the variance, there may be cases where the variance is no- 
tably non-constant after transformation. In such cases, analysis of the 
transformed data using weights seems appropriate. 


APPENDIX 
VALIDITY OF THE ANALYSIS 


This section is prepared for those who may feel that the method of 
obtaining the “single degree of freedom” may not produce quantities 


with the usual distribution. 
The basic fact is this: If wu, , U2, °° , Ue 301, V2, °°* » Um have some 
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joint distribution, and if, for fixed uw, , U2, +++ , % , the distribution of 
D1, V2, °** , Um exists and is always the same, then the marginal distribu- 
tion of v; , V2, -** , Um exists and, indeed, is the same, and, furthermore, 
Uy , Ur, °** » U, andr, , V2, °+* , Um are independent. This can be estab- 
lished either by general considerations or by analytical detail. 

To apply this in our case, let uw, , U2, +: , Ue be the row and column 
means, and let v, and v, be the sums of squares for non-additivity and 
for the balance. If the situation is additive, and the cell effects are 
normally distributed, and uw, , uw. , --- »U, are fixed, then v, and v2 are 
independently distributed like o* times chi-squares on 1 and re — r — ¢ 
degrees of freedom. Hence v, and v, have these distributions, and are 
independent of all functions of row and column means. Thus the F-tests 
of rows, columns, or non-additivity against balance are valid. 

In the presence of non-additivity and/or non-normality, the usual 
arguments indicate that the F-test is, if anything, conservative. 
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ON A STATISTICAL APPROXIMATION TO THE 
INFECTION INTERVAL 


J. B. Cuassan* 


N A PREVIOUS PAPER (2) the existence of strong correlation between the 

logarithms of the morbidity rates of a group of respiratory diseases 
for successive calendar month-pairs was demonstrated. The case rates 
involved pertain to the combined incidence of catarrhal bronchitis, acute 
coryza, acute catarrhal pharyngitis and laryngitis, and influenza, as 
diagnosed and reported in the United States Army. Where C; is the 
case rate observed in the 7-th calendar month, and C;,, , the correspond- 
ing rate observed in the succeeding calendar month of the same year 
(or the same winter when z = December), the value of riog c; tog ci +, [OF 
the twelve month-pairs averaged .84, each of the twelve coefficients 
being based upon some 38 observations, according to the number of 
years for which data were available for each month-pair. The purpose 
of the present paper is to relate some of the results obtained in connec- 
tion with ref. (2) to the law of mass action in epidemiology, and to derive 
therefrom an estimate of the infection interval for an assumed period of 
immunity following infection, or conversely, an estimate of the period of 
immunity corresponding to a known infection interval. In connection 
with the actual numerical values presented, it should be noted that they 
pertain to a group of diseases and therefore can be interpreted only as 
average for the group as a whole. 

The law of mass action in epidemiology states that the rate at which a 
contagious or epidemic disease spreads in a community is proportional 
to the product of the number of infectious individuals and the number of 
susceptibles in the community. If two consecutive time intervals are 
chosen such that the length of each interval is equal to the period 
between contact and case manifestation (i.e., the mcubation period), a 
contact between an infectious person and a susceptible in the first 
interval will result in a new case during the second. Then the law of 


mass action may be written as 


*The author wishes to acknowledge the helpful criticism and suggestions of Prof. J ohn W. Tukey 
of Princeton University in connection with this paper. 
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Cin = > SE (1) 


m 
in which 


(a) C;., is the expected number of cases (or the case rate) during the 
(¢ + 1)-th period. 

(b) S, is the average number of susceptibles in the 7-th period. 

(c) I, represents the average number of infectious individuals during 
the 7-th period. 

(d) m™' is the proportionality constant reflecting such factors as the 
degree of crowding in a community, seasonality; more abstractly 
“infective power”. 


= 


For the case in which the period of communicability following infec- 
tion is relatively short, it is convenient to consider incidence in successive 
intervals whose lengths are each equivalent to the infection interval, 
rather than to the incubation period. The infection interval may be 
defined as the average period between the manifestations of two cases, 
one case resulting from contact with the other. It can be regarded as the 
sum of two components: first, the (average) time it takes for adequate 
contact to take place between a newly infected person and a susceptible, 
and then, the period between contact and manifestation. In such a 
case, we may replace J; by C; m equation (1), obtaining Soper’s formula, 


1 
Cia = ™ SC; (2) 


which gives the relationship between incidence rates in two consecutive 
periods whose lengths are each equal to that of the infection interval. 

Soper (1) has also stated the relationship for the case in which the 
incidence rates are taken over successive periods of arbitrary length. If 
C; is the case rate observed during the 7-th month, and S; is the average 
number of susceptibles in the 7-th month, then 


on = (Se. © 


where C;;., is the incidence rate in the (¢ + 1)-th month, and p represents 
the numb 2r of infection intervals in one month. 

If C; is expressed as a daily incidence rate in terms of the number 
infected per day out of each 1000 population, and if n is the number of 
days of immunity following infection, then nC; will give the average 
number per 1000 population who are not susceptible , by virtue of recent 
infection, during the month in which C; is observed. On the assumption 
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of general susceptibility in the population, the corresponding number of 
susceptibles per 1000 will then be given by 


S; = 1000 — nC; (4) 


Substituting this value in (3), we obtain 


Sr (=) (1000 — n@;)’C; or 


log Cis: = p log m™ + p log (1000 — nC) + log C; (5) 


Interpreting this equation in a statistical sense, i.e., as a regression 
function in which z;,, = log C;., is regarded as the average value corre- 
sponding to a fixed observation of xz; = log C; , the data for the group of 
respiratory diseases under consideration indicates that the true regression 
curve of z,.; On Z; increases monotonically with slowly declining slope 
over the actual range of observations. Apart from sampling differences 
a straight line of the form 


ti = a+ Ue; (6) 


fitted by the method of least squares should lie close to the regression 
curve over the range of observed values of z; , and the slope of the line, 
b, should very nearly equal the slope, 6, of the secant which intersects the 
true regression curve at points corresponding to the lowest and highest 
of the observed values of x; , respectively. An approximation to the 
infection interval can then be obtained by equating b, the slope of the 
linear regression of z;,, on z; , to 8, the slope of the secant. 

If C;, represents the lowest of the observed rates in the i-th month, 
and C;, , the highest; their substitution, in turn, for C; im equation (5) 
above yield, as co-ordinates of the secant at the points of intersection 
with the mass action curve, the points, 


(log C;, ; log (5) (1000 — n.,)°C.,}) 


and 


(tog Cos clog (4) (1000 — nC)*Ca) 


respectively, where each ordinate is expressed as a function of the corre- 


sponding abscissa. 
Then, from elementary analytic geometry the slope of the secant 


will be 
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ites (ue. 
1000 — nC, 
- iz 7 
ere ah P| log (C:,/Ci,) } 2 


Solving for p, and ee b for B, 
lor Cay) | 


libs eee SAC. ) 
1000 — nC;, } 


Upon applying formula (8) to the data of reference (2) for the twelve 
month-pairs, a median value of p = 15 was obtained on the assumption 
of three weeks of incidence as equivalent to the number of non-suscepti- 
bles, i.e., when » = 21. Since the incidence data were taken over 
monthly intervals, the corresponding estimate of the average infection 
interval is 2.0 days. On the assumption that n = 28, the median value 
of p is 11, and the infection interval, 2.8 days. Finally, if the assumption 
is made that n = 42, an average infection interval of 4.1 days is esti- 
mated. Thus in the neighborhood of the assumed values of 7, the ratio 
of the period of immunity to the length of the infection interval is 
approximately 10: 1. 
Illustrating the procedure graphically, the chart given shows: 


Retire a (8) 


(a) a plotting of observed points corresponding to the February— 
March relationship 

(b) a theoretical drawing of (5), represented by the curve MM’, and 
interpreted as a regression curve 

(c) the least squares linear regression of z;,, on x; , LL’, fitted to 
the scatter of points 

(d) a secant to the curve MM’, drawn as a dashed line; the secant 
is drawn so that it intersects the curve to the left at the point 
whose abscissa is log C;, , where C;, is the smallest of the ob- 
served values C; , and to the right, at the point whose abscissa 
is log C;, , where C;, is the largest of the observed values of C; . 


The position of the curve 1M’ in relation to its secant and to the 
least squares line LL’, (again, apart from sampling errors) can be de- 
termined by formulating the vertical distance between MM’ and the 
secant. By differentiation, both the maximum distance and the value 
of log C; at which the maximum distance occurs can be determined. 
Thus if the equation of the secant is given by 


log Cisn =a + 6 log C;, (9) 
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GRAPHICAL REPRESENTATION OF THE ESTIMATING RELATIONSHIPS IN 
THE APPROXIMATION TO THE INFECTION INTERVAL 


LINE OF 
LBAST yee Ne eS 


t 


Xie * /o Ce 
(MARCH INCIDENCE) 


Jog Cj fog Cj 
— *: Lf 
re ,= log Cc. 
(FEBRUARY INCIDENCE ) 
THE SECANT TO THE LAW OF MASS ACTION CURVE, INTERSECTING THE CURVE AT EXTREMES OF 


AL; IS ASSUMED PARALLEL TO THE LEAST SQUARES LINE OF REGRESSION. 


it will be found, by substituting C;, = C; in equation (9) and in (5) 
that 


a = plog m' + p log (1000 — nC;,) + (1 — B) log C,, . 


The distance from the secant to the curve will then be 


2 1000 — nC; a Ci 
o = p log pee 70s agele=a pp) log G (10) 


where b can be substituted for 8, and p is obtained from (8). 
The maximum value of ¢ can, of course, be obtained by differentia- 


tion with respect to C; , or log C; and equating to zero. Then the curve 
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MM’ can be closely approximated from the position of the least squares 
line. Taking 

M=L+¢-—1/2max¢ 
where, for a fixed value of log C, , L is the corresponding value of log Cast 
on the least squares line, and ¢ is taken from (10), M is the corresponding 


value of log C;,, , on the curve. 
For an assumed value of p, equation (8) can be solved for n. From 


a 1000 — nC; 
Ge ~ 1000 — n0;, 


we obtain 


we 
1000| 1 _ oo 


fy 
Go c.( Ee 


In applying the foregoing type of analysis the following modifications 
or limitations should be considered: 

(i). We have assumed that for a fixed month-pair the infectivity 
factor, m", is constant, except for random variation. Were it not for 
the fact of a declining number of susceptibles, S; , with increasing C; , as 
described by equation (4) above (.e. if S; were constant over the range 
of C;), the mass action curve as given by equation (5) above would as- 
sume linear form with slope unity. But the declining value of S; has the 
effect of causing the slope to drop with increasing C; , so that apart 
from sampling errors, the slope of b (and of 8) will be less than unity. 
This can be seen quite easily if we write equation (3) as 


Cie = A.C; 


n= 


Then, if A; were constant for all C; , a plotting of the curve (on a log-log 
scale) would yield a straight line parallel to 


Cay = C; 


at a vertical distance of log A; . But with the damping effect of the 
decline of susceptibles as C; increases, A; correspondingly decreases; and 
if, for example, log A; is still positive the distance between the two lines 
decreases with increasing C; , and it then follows that 8 < 1. The same 
result would, of course, apply when log A; is negative. 

Now if the situation were such that as C; increases various preventive 
measures are taken which significantly reduce the infectivity factor, 
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further damping will take place, and b will become smaller. To take 
this into account it would then be necessary to adjust upward the value 
of b, resulting in a corresponding increase in the length of the infection 
interval for each of the assumed values of n. 

(1). Equation (4) above implies that the entire population is poten- 
tially susceptible, and that the only immunes present at any given time, 
are those individuals who have gained immunity for a short period by 
virtue of recent infection. If, however, only a fraction, g, of the entire 
population are potentially susceptible, then instead of (4), it would be 
necessary to write 


S; == 1000q — nGe (11) 


and substituting this, instead of (4) in (5), and in (7) and (8), it will be 
seen that for the same observed value of b, a somewhat longer infection 
interval would be estimated, depending on the degree of departure of 
q from unity. 

(iii). Equations (4) and (11) will progressively lose accuracy as n 
gets very large. Thus, if the period of immunity were to last several 
months, these expressions would require modification to take account of 
variation in C;_, , Cy_2 - 

References (3) and (4) listed below, and others listed in these refer- 
ences, discuss various aspects of the law of mass action of importance 
in connection with epidemic theory. 
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QUERIES 


QUERY: [ am carrying forward research on little known or on 
70 unknown tropical feedstuffs. For this research, rats, baby chicks 
and pigs are being employed. The unknown feedstuffs are evalu- 
ated singly and in combinations. I would appreciate your opinion on 
the proper method of statistical analysis for our data. 
As an example and for brevity, here are some actual data from a pilot 
trial, together with the analysis of variance. 


WEIGHT GAINS OF BABY CHICKS 


Treatment 
No. chicks 1 2 3 4 Entire sample 
1 55 61 42 169 
2 49 112 97 137 
3 42 30 81 169 
4 21 89 95 85 
5 52 63 92 154 
219 355 407 714 1695 


ANALYSIS OF VARIANCE 


Sources ID), S.S. M.S. 
Lot means 3 26235 Si ohn 
Individual 16 11559 722 
Total 19 37794 


The F-test in the above case is highly significant indicating that we 
are not dealing with a single population. This method of analysis how- 
ever does not provide us with a means of stating that treatment No. 3 
is better than No. 1 or No. 2 is better than No. 4, ete. Could you pro- 


vide us with the most valid method with which we could make these 
comparisons? 
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Happily this perennial question has been provided with an 
ANSWER: answer by Dr. John W. Tukey in the June issue of this 

Journal (Vol. 5: pages 99-114, 1949). Tukey’s method 
indicates a gap between the first three treatments and the fourth. Ata 
risk of less than one per hundred, one would reject the hypothesis of no 
difference between treatments No. 3 and No. 4. 

There is not sufficient evidence to cut off the straggling mean of 
treatment 1 (P = 0.17). Finally, applying the F-test as indicated by 
Tukey, one does not reject the hypothesis that lots 1, 2, 3 are drawn from 
a common population (P = 0.1). 

I assume that your experiment was conducted so that environmental 
differences were randomly distributed over all the chicks in the experi- 
ment; otherwise, there is no unambiguous answer to the question about 
the effects of treatments. 


QUERY: [In an experiment in which one half of the controls 
71 reacted positively and one half negatively, it would seem that 
chi-square should be the same whether one uses the formula, 


x” = 2x — m)*/m, 
or the formula for the 2 * 2 table, 


(ad — be)*(a + b+c+4+d) 
x “(at e+ Aa+ ob +d 


But this is not the case. Why? 

For example, suppose 200 animals are divided equally among experi- 
roentals and controls. Then, according to the proposition under con- 
sideration, suppose 50 controls live and 50 die, and suppose 63 of the 
experimentals live and 37 die. Is the experimental procedure effective? 

By the 2 X 2 table, x” = 3.438, not significant. But by the other 
formula, comparing the experimentals with a 1: 1 ratio, x° = 6.760, 
highly significant. Why do not the two methods agree? 


You have described two different experiments leading 
ANSWER: quite properly to different values of chi-square. In the 

first experiment there are only 100 animals, all treated 
experimentally. The assumption is made that in the untreated popula- 
tion the ratio of the numbers living and dying is 1: 1. The hypothesis 
being tested is that the same ratio applies to the treated population; that 
is, that the treatment is without effect. The value of chi-square, 6.760, 
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would lead to rejection of the hypothesis with P approximately 0.01. 
In this experiment there are no controls because the experimenter sup- 
plies the information about how controls behave. 

The second experiment contains 200 animals, but half of them are 
used to get evidence about the behavior of the untreated population. 
Here the experimenter either has no knowledge of the behavior of the 
controls or is unwilling to rely on his knowledge. In this experiment, 
the hypothesis being tested is that the experimentals and controls have 
the same ratio, but the value of the ratio is not specified. The experi- 
menter supplies less information than he did in the first experiment. 
The result is that the same number of experimentals, divided in the 
same ratio, lead to less certainty about the conclusion. 

Querist feels that the chance division of the controls in the 1 : 1 
ratio is equivalent to the 1 : 1 hypothesis which was set up in the first 
experiment. That this is not true may be clear if he considers the 95 
percent confidence interval based on a sample of 100 equally divided in 
outcome. This interval is from 40 percent to 60 percent. The corre- 
sponding 99 percent interval is from 37 percent to 63 percent. Evidently 
the information supplied by such a sample of controls is far less than that 
furnished by the experimenter in postulating the 1: 1 ratio for the 
population of controls. 


QUERY: Hace un tiempo, se discutia en una reunidn efectuada 
72 entre técnicos especialistas en maiz las exigencias para aprobar 
un hibrido o rechazarlo.— 

Alguien sugirié aceptarlos cuando los rendimientos eran estadistica- 
mente significativos.— 

Y aqui comenzé la controversia. Otro técnico tomé la palabra para 
exponer su pensamiento al respecto. Dijo, que si se efectuaba un ensayo 
con todo cuidado, las exigencias para considerar un determinado hibrido 
estadisticamente superior a otro (altamente significativo), serian muy 
reducidos. Por ejemplo, un 8% de diferencias en los resultados, podria 
ser lo suficiente para que de acuerdo al andlisis estadistico, se considere 
a un hibrido superior.— 

Esto llevaria a un error, pues un 3%, en la prdctica (en el gran cul- 
tivo) no tendria ninguna importancia, por lo que el procedimiento era 
erréneo. En cambio se mostr6 partidario de exigir un 10% de diferencia 
en los rendimientos y fijar un error standard de por ejemplo 6%.— 

Desde luego, no sé a ciencia cierta quien tiene razén, por lo que 
recurro a Ud. a fin de que me evactie la consulta. Puede hacerlo en 
inglés.— 
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Yield trials of various crops are usually conducted for one 
ANSWER: of two reasons: (1) to provide a test for a particular 

hypothesis or (2) to provide information which can be 
used as a guide in making recommendations over a range of soil and 
climatic conditions. 

In the first instance an efficient experimental design and adequate 
replication are necessary so that the desired tests may be performed with 
the required precision. The number of replications and choice of design 
will, in part, be dictated by past experience as to soil variability, etc. 

In the more general case where yield trials are conducted to provide 
information which will serve as a guide in making general recommenda- 
tions, the situation is quite different. It is well established that different 
varieties respond differently in different years and at different locations. 
Therefore, varietal trials must be grown at several locations and in differ- 
ent years. Thus, there is little point in striving for “statistical signifi- 
cance”’ in each of the individual tests. An increase in number of replica- 
tions for any single test will have little effect in reducing the magnitude 
of the variety X year or variety X location interaction. 

The general practice in yield trials is not to select one or a few of the 
apparently superior items, but rather to discard a group of the poorer 
items. The items remaining are then tested further to provide additional 
information on performance. 

If anumber of varieties are tested over a series of years and locations, 
the outcome will almost certainly be a group of varieties which are so 
similar in yield and other characteristics that the differences among 
them will not be statistically significant. The best estimates of the rela- 
tive value of the varieties in this group will be the actual averages 


obtained. 
G. F. SPRAGUE 


THE BIOMETRIC SOCIETY 


By the time this number of Biometrics reaches you, each member of 
the Society will have received his free copy of our first Directory. 
Additional copies have been printed to send to new members as they are 
enrolled. It is available to non-members for 50 cents. Until a new 
edition is warranted, we propose issuing an annual supplement. As you 
will have discovered, the Directory includes a list of officers, the consti- 
tution of the Society, the Council by-laws, and the statutes of each 
region as well as the alphabetical membership list and a geographical 
summary. The information provided for each member includes his 
professional connection as recorded in the Secretary’s office on June 15 
and his major field of interest. Later, we hope to summarize the dis- 
tribution of members among the different fields of interest. Although 
the Society has been in existence for less than two years, the geo- 
graphical breakdown shows that we had 888 members in 33 different 
countries when the Directory went to press. The first and largest 
organized region was the Eastern North American, with 478 members. 
The other regions in order of formation were the British with 111 
members, Western North American with 73 members, Australasian with 
37 members, Indian with 43 members and French with 47 members. 
In addition, there were 99 members-at-large. 

Since the last issue, the Council has approved the statutes of the 
Australasian, Indian and French Regions. These are already included 
in the Directory, so that they need not be reprinted here. 

Developments in France are of unusual interest. The biometricians 
there have adopted a dual organizational plan in accord with a law of 
1901 governing official French societies.. They have formed the autono- 
mous Société Francaise de Biométrie. At the same time they have 
formed the Region Frangaise of the Biometric Society and provided that 
all full members of the Société Francaise de Biométrie shall be members 
of the Biometric Society. In view of this interesting development the 
tentative proposal of a joint French-Italian region has been abandoned. 
At the last meeting of the Société Frangaise, on May 17 at the Labora- 
toire de Zoologie de la Faculte des Sciences, Paris, the following com- 
munications were presented: “La rehabilitation de Vhomme moyen” by 
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M. Frechet, ‘Facteurs lateraux et facteurs sexuels dans la morphologie 
des empreintes digitales’”’ by R. Turpin and M. P. Schutzenberger, and 
“Etudes biometriques sur le colibacille” by J. Dufrenoy. 

Within the last months the following regional officers have been 
elected and confirmed by Council: British Region: Vice-President, 
J. W. Trevan; Secretary, D. J. Finney; Treasurer, K. Mather; Regional 
Committee, J. O. Irwin, J. I. M. Jones. Indian Region: Vice-President, 
P. C. Mahalanobis; Secretary, C. Radhakrishna Rao; Treasurer, Mohan- 
lal Ganguli; Regional Committee, V. M. Dandekar, K. Kishen, K. R. 
Nair, U. 8. Nair, V. G. Panse, P. B. Patnaik, B. Ramamurthy, R. V. 
Sukhatme, V. D. Thawani. 

Since last November the Society has been provided with temporary 
headquarters in a pleasant room at 321 Congress Avenue in New Haven 
by the Department of Public Health of the Yale University Medical 
School. This room, however, will be required for new activities in the 
next academic year. Through the kindness of the Department of Ap- 
plied Physiology, the Society has had the good fortune of obtaining a 
larger room at 52 Hillhouse Avenue in the main part of the University, 
and moved there on July 5. We would be very glad to welcome any 
visiting members at our new headquarters. We are very sorry to lose 
the services of Mrs. Elizabeth Weinman, who was Executive Assistant 
to the Secretary through June 30. The Society has benefited greatly 
from her efficient handling of the many details of the Secretary’s office 
and wishes her well in her new undertaking. We have been fortunate 
in obtaining as her successor Mrs. Irving N. Fisher, who knows at first 
hand all of the countries where we have regions and most of the other 
countries where we have members. 


NEWS AND NOTES 


At the Raleigh branch of the Institute of Statistics there is a small 
news publication called the ‘Leaky Gasjet”’ which is printed irregularly 
depending upon the quantity of choice gossip acquired by its faithful 
seekers. The following excerpt was taken from the June, 1949, edition. 


Dear Gasjet Editor: 


I am a newly created Ph.D. in Experimental Statistics and I am 
worried because I expect to do consultation and I am afraid that the 
research workers will ask me questions that I won’t be able to answer. 
What shall I do? 

Phidler 


Dear Phidler: 


Here are a few simple devices which should prove useful to you in 
your consulting work. Relax, once you have mastered them you have 
absolutely nothing to worry about. 


Research Worker: Confidently. JI have done an experiment, Mr. 
Phidler, in which I have two plants, one of each variety, in each pot 
and fifteen pots. Can you tell me how to analyze it so as to show 
that Variety A is taller than Variety B? I realize laughing selfcon- 
sciously that this is a very elementary question but... 

Phidler: Frowning. Naturally as a new Ph.D. this is far too difficult a 


question for him, but he is not alarmed. Just what do you mean by 
taller? 


This illustrates both the Device of the Counterquestion and the Device 
of the Definition of Terms. 


Research Worker: A bit taken aback. Taller? Well I mean bigger— 
not not bigger— 

Phidler: Sternly. Come now, we cannot get anywhere unless we have 
specific, operational definitions. 

Research Worker: Yes, of course. What I meant was I measured the 
height of each plant and— 


256 


NEWS AND NOTES 257 


Phidler: The external or the internal height? He pauses, but Research 
W orker vs unable to answer. A similar problem came up in the Jour.- 
Roy-Stat-Soc-Supple-eleventy two-page 476. 


The Device of the Non-Existent Reference 


Research Worker: Awed. What was that reference again? 

Phidler: No matter. It’s by Gregory Hairshirt. I knew Hairshirt in 
kindergarten—an idiot—his papers were demolished by Smirkley 
Annals of Applied Human Genetics. Let’s get back to our little 
problem. 


The Device of Complete Familiarity with Everyone and Everything 


Research Worker: felieved. Yes, Yes. Now I thought this design— 
Phidler: Design? Laughs Yes design. You realize of course that you 
should have used a cuboidal lattice in this experiment. 


The Device of the Wrong Design 


Research Worker: [—well I didn’t know— 

Phidler: Aloud to the walls. How do these research workers expect us 
to get anything out of their data when they use any old design. Ah 
well, I suppose we can work it out by matrix methods. Tell me, 
what is the Cost Function for height in this problem? 


The Device of the Unnecessary Complication 


Research Worker: Cost? I don’t know—I thought this was a simple 
sob! problem—but after all I’m only a miserable research worker and 
not a statistician alas! 

Phidler. Benevolently. Now, now, don’t cry. I will help you. This is 
really a very simple problem. 


The Device of Reversing your Field 


Research Worker. On his knees. For you, perhaps, O Master. The 
research worker is now in the proper frame of mind for consultation. 
From here on in Phidler can do ANYTHING. 


AFRICA—Among our new members, Henri Marchand, Dakar, 
Sénégal, West Africa, writes, ‘‘My researches are purely theoretical in 
the field of mathematical genetics. As soon as my present studies on the 
part that a single body can have on the evolution of a population are 
advanced, it will give me great pleasure to send you a report on the 
results at which I will have arrived.” 
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AUSTRALIA—Helen Turner had plans all completed to attend the 
Second International Biometrics Conference in Geneva and to spend six 
months in Cambridge. Unfortunately family illness has intervened and 
the trip has been postponed. D.B. Duncan is busy developing a teaching 
program in Statistical Methods in the University of Sydney. Three new 
courses have been set up and the first graduate in Agricultural Science 
with Honors in Statistical Methods, J. A. Morris, took his degree this 
March and is now working in animal genetics in the Division of Animal 
Health and Production of C.S.I.R.O. H. O. Lancaster of the Common- 
wealth Health Department has just completed a year’s study in England 
and is now on his way back to Australia. E. A. Cornish has a new F, to 
carry on the statistical tradition. C.W.Emmens, author of the recently 
published Principle of Biological Assay, is coping well with a large de- 
mand for presentation of papers to scientific societies in Sydney. 


FINLAND—Leo Torngqvist, Chairman of the Institute of Statistics, 
University of Helsinki sends a brief note. He writes, ‘‘The Institute of 
Statistics in the University of Helsinki was founded in 1945, but has 
started its activity only in 1947. Its Chairman is the professor in statis- 
tics of the University, and an M.A. works there as assistant. The 
Institute is partly a statistical library, partly an advisory and direction 
office for the students of statistics. In addition the chairman, the assist- 
ant, and the more progressed students work with special statistical 
researches for outsiders. The received tasks have chiefly been from the 
branches of population—prognostics and analysis of economic time- 
series. The teaching in statistics belongs in the University under the 
Faculty of the Political Sciences. The student can choose statistics in 
the M.A.-examination for his chief subject or for one of his side subjects. 
After the M.A.-examination it is possible to go on with the studies as 
far as to the doctoral thesis. My special interests in statistics are the 
theoretical and economical problems.” 


INDIA—D. N. Nanda has taken up the position of a Statistician for 
Indian Army Ordnance Corps. He writes, ‘In this capacity I am to 
conduct Applied Research on the following subjects: (1) Design and 
Analysis of Experiments, (2) Quality Control, (3) Sampling Surveys 
(including inspection methods). There are a number of other topics on 
which I may have to work from time to time.” He would appreciate 
being informed of the latest developments in these fields. 
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UNITED STATES—On February 1, Alexander G. Ruthven, presi- 
dent of the University of Michigan, announced the establishment of the 
Institute for Social Research. ‘‘The institute will be directed by Rensis 
Likert and will provide a unified administration for two units already 
existing at the University, the Survey Research Center and the Research 
Center for Group Dynamics. Angus Campbell will succeed Mr. Likert 
as Director of the Survey Research Center, which will continue its 
major programs of research in such fields as: studies of economic behavior 
and motivation; studies in human relations and organization; studies of 
the American public’s understanding of major national and international 
issues; and the development of sampling survey methodology. Dorwin 
Cartwright will continue as Director of the Research Center for Group 
Dynamics. As a part of the Institute for Social Research this group will 
continue its program of research on the factors influencing productive 
and harmonious group functioning. It will continue its studies on human 
relations in industry, leadership, communication within groups, inter- 
group relations, and the social satisfaction of community life. As a 
result of the joining of the two centers, the Institute is better able to 
bring to bear quantitative and experimental research methods on com- 
plex and important social problems. Research findings of the Institute 
are communicated not only through teaching and scientific publications, 
but also through consultation and training in various organizations. 
The staff of the Institute includes over 350 persons engaged in full time 
or part time work. Approximately 125 of this number are located in 
Ann Arbor. Although most of the professional staff are social psycholo- 
gists, various other social sciences are represented.” Melville A. Taff, 
Jr., formerly with the Louisiana State Department of Health, New Or- 
leans, is now with the Territory of Hawaii Department of Health, 
Honolulu, as Chief of the Bureau of Health Statistics. Mr. Taff writes, 
“The Bureau is being expanded to provide statistical service for the 
entire department. Additional tabulating equipment has been ordered 
and more statistical personnel will be added as necessary. A central 
statistical service unit is the prime objective. An Act patterned after 
the Uniform Vital Statistics Act was passed at the 1949 session of the 
Legislature and now awaits the signature of the Governor. Once signed 
one of the first moves will be to consolidate small and sparsely populated 
registration districts and wherever possible and practical to appoint 
the local health officer as local registrar. Office methodologies are being 
reviewed and revised procedures are being written.” Paul T. Bruyere 
formerly with the Army Institute of Pathology is now with the Division 
of Tuberculosis, United States Public Health Service. He, with Martha 
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Bruyere, is making a study of the early development of tuberculosis 
among student nurses. Jack Chassan also joined the United States 
Public Health Service and is working with the Bruyere’s on the student 
nurse study. He recently left the Office of the Surgeon General, Depart- 
ment of the Army. Allen B. Burdick is now Assistant Professor of 
Agronomy, University of Arkansas, Fayetteville. He is initiating re- 
search in the development of grain and forage types of sorghum and will 
teach a course in the genetics of plant breeding. His theoretical research 
will continue to emphasize the mathematical aspects of quantitative 
inheritance. Mr. Burdick was with the Atomic Energy Commission 
at the Genetics Division, University of California, Berkeley. H. M. C. 
Luykx is resigning his position as Associate Professor of Preventive 
Medicine, at New York University College of Medicine, to accept 
appointment as Biometrician for the Atomic Bomb Casualty Commission 
in Japan. The Commission operates under the Committee on Atomic 
Casualties of the National Research Council, Washington, by directive 
of the President, and is sponsored by the Atomic Energy Commission. 
Mr. Luykx will be stationed in Japan for about two years, where he 
will make his home in Kure, with frequent visits to Hiroshima and Naga- 
saki. R.L. Murphree recently resigned his position with the Bureau of 
Dairy Industry at Jeanerette, Louisiana, to accept a position as Associate 
Professor of Animal Husbandry at the University of Tennessee. Ken- 
neth S. Cole, formerly with the Institute of Radiology and Biophysics 
of the University of Chicago, is now Scientific Director of the Naval 
Medical Research Institute at Bethesda, Maryland. Theodore A. 
Bancroft has joined the staff of the Iowa State College Statistical 
Laboratory as Associate Professor—July 1, 1949. Gobind Ram Seth, 
on a four months’ leave of absence from the Statistical Laboratory at 
Ames, flew early in July to visit the statistical institutions in Sweden 
and England, before returning to Delhi, India, where he will be teaching. 
Oscar T. Kempthorne was married in Vancouver, British Columbia, 
Canada, on June 10, 1949, to Miss Valda M. Scales of Coogee, New 
South Wales, Australia. Professor and Mrs. Kempthorne will be at 
home at 127 Stanton, Ames, Iowa, sometime in July. 


